You are viewing a plain text version of this content. The canonical link for it is here.

Posted to server-dev@james.apache.org by "Niklas Therning (JIRA)" <se...@james.apache.org> on 2008/07/21 11:35:31 UTC

[jira] Created: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
----------------------------------------------------------------

                 Key: MIME4J-62
                 URL: https://issues.apache.org/jira/browse/MIME4J-62
             Project: Mime4j
          Issue Type: Bug
    Affects Versions: 0.4
            Reporter: Niklas Therning
            Priority: Minor
             Fix For: 0.4


ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:

(3)   (White Space) Octets with values of 9 and 32 MAY be
          represented as US-ASCII TAB (HT) and SPACE characters,
          respectively, but MUST NOT be so represented at the end
          of an encoded line.  Any TAB (HT) or SPACE characters
          on an encoded line MUST thus be followed on that line
          by a printable character.  In particular, an "=" at the
          end of an encoded line, indicating a soft line break
          (see rule #5) may follow one or more TAB (HT) or SPACE
          characters.  It follows that an octet with decimal
          value 9 or 32 appearing at the end of an encoded line
          must be represented according to Rule #1.  This rule is
          necessary because some MTAs (Message Transport Agents,
          programs which transport messages from one user to
          another, or perform a portion of such transfers) are
          known to pad lines of text with SPACEs, and others are
          known to remove "white space" characters from the end
          of a line.  Therefore, when decoding a Quoted-Printable
          body, any trailing white space on a line must be
          deleted, as it will necessarily have been added by
          intermediate transport agents.

To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Posted by "Robert Burrell Donkin (JIRA)" <se...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615307#action_12615307 ] 

Robert Burrell Donkin commented on MIME4J-62:
---------------------------------------------

Niklas - that sounds like a good plan. IIRC MULTIPART_WITH_BINARY_ATTACHMENTS may be used in several tests so might be easier to create a copy for this test and just use that. 

The extra CRLF rings a bell. IIRC I had trouble removing it before but yes, IMO it needs fixing.

Robert 

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: TextAttachmentEncodingTest.java
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[jira] Resolved: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Posted by "Niklas Therning (JIRA)" <se...@james.apache.org>.

     [ https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Niklas Therning resolved MIME4J-62.
-----------------------------------

    Resolution: Fixed

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: TextAttachmentEncodingTest.java
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Posted by "Stefano Bagnara (JIRA)" <se...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615253#action_12615253 ] 

Stefano Bagnara commented on MIME4J-62:
---------------------------------------

And the SPACE and TABS special rule is not about "space optimization" rather about having almost all text converted to easy to read text.
e.g:
This is a message having also 8bit euro € char
i currently converted in 
This=20is=20a=20message=20having=20also=208bit=20euro=20=A4char
while the "optimized" version would be:
This is a message having also 8bit euro =A4 char

MIME specification is careful about good degradation when the content is read by non-mime readers or agents having issues with charsets/decoding and similar things.

So, I agree with Niklas and I think this issue is a good wish, but there is no need to work on this improvement if other committers thinks it would be bad to have a similar behaviour.

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Posted by "Stefano Bagnara (JIRA)" <se...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615214#action_12615214 ] 

Stefano Bagnara commented on MIME4J-62:
---------------------------------------

Where does the current binaryQuotedPrintable and Base64 encoders come from?
I've not been able to track this down: should we reuse commons-net codecs for this "common" outputstreams? (maybe copying them to our codebase, as we may want to fix/alter them and to not depend on commons-net for this).

I checked mime4j 0.3 and if I'm not missing anything they was not there (we handled temporary files differently).

I think this code has been introduced by Robert 8 weeks ago:
https://issues.apache.org/jira/browse/MIME4J-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
http://svn.apache.org/viewvc?view=rev&revision=660013
http://svn.apache.org/viewvc?view=rev&revision=660206

Robert, did you write that code from scratch? Should we try to fix it or instead reuse some other ASF code?
Wouldn't be better to move outputstreams to top level classes?

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Posted by "Stefano Bagnara (JIRA)" <se...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615297#action_12615297 ] 

Stefano Bagnara commented on MIME4J-62:
---------------------------------------

I'm not sure "_" is correct for SPACE in standard QuotedPrintable. I thought it was only a valid char in Q-strings (=?charset?encoded style), but take this only as an hint.. I'm far from being a mime expert.

For the CRLF issue, yes, please.. commit it!
For the 3 failing tests just find a solution that Robert likes: he already attached the test here so that the 3 failing tests could be deleted.

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: TextAttachmentEncodingTest.java
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Posted by "Stefano Bagnara (JIRA)" <se...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615274#action_12615274 ] 

Stefano Bagnara commented on MIME4J-62:
---------------------------------------

MY OPINION is that rules #3, #4 and #5 are not for space optimization but for better representation of the content when a decoding is not possible. But my opinion is not important in resolving this issue.

We have 3 tests I see are in MessageWriteToTest
> - testBinaryAttachmentLenient
> - testBinaryAttachmentStrictError
> - testBinaryAttachmentStrictIgnore

The expected result written in this tests expect a quoted-printable encoder supporting at least #3 and #5 spec from rfc1521.

Either we add these features or we change the expected result.
(of course it simpler to change the expected result).

I tried this locally and it seems there is another bug about a CRLF sequence added in the roundtripping. Maybe a problem in the QuotedPrintableInputStream or in the MimeBoundaryInputStream, no clue yet.

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Posted by "Niklas Therning (JIRA)" <se...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615278#action_12615278 ] 

Niklas Therning commented on MIME4J-62:
---------------------------------------

It's not trivial to implement rule #3. Seems like commons-codec doesn't either: http://commons.apache.org/codec/apidocs/org/apache/commons/codec/net/QuotedPrintableCodec.html

I suggest that we change the expected output of the 3 tests to make the tests pass. Then this issue won't be as important. As Robert pointed out, the encodeQuotedPrintableBinary() method is meant for encoding of binary data. It also encodes line endings which isn't desirable for textual data. I think we should add a new method, encodeQuotedPrintable() or maybe encodeQuotedPrintableText(), which returns more readable output and is intended for textual data. WDYT?

Stefano, how about creating a new issue for the CRLF problem?

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Posted by "Robert Burrell Donkin (JIRA)" <se...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615268#action_12615268 ] 

Robert Burrell Donkin commented on MIME4J-62:
---------------------------------------------

binaryQuotedPrintable is intended to be used only for non-textual data. So this is just about space optimisation. Most other implementations use a wrapped stream approach so I'm not sure that a straight replacement would be available. 

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Posted by "Robert Burrell Donkin (JIRA)" <se...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615395#action_12615395 ] 

Robert Burrell Donkin commented on MIME4J-62:
---------------------------------------------

I've committed a first attempt at a encoder suitable for text. The CRLF issues need to be resolved before the test will pass.

More testing is required of the implementation.

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: TextAttachmentEncodingTest.java
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[jira] Closed: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Posted by "Niklas Therning (JIRA)" <se...@james.apache.org>.

     [ https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Niklas Therning closed MIME4J-62.
---------------------------------


> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: TextAttachmentEncodingTest.java
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[jira] Updated: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Posted by "Robert Burrell Donkin (JIRA)" <se...@james.apache.org>.

     [ https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Burrell Donkin updated MIME4J-62:
----------------------------------------

    Attachment: TextAttachmentEncodingTest.java

These tests capture well the correct behaviour.

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: TextAttachmentEncodingTest.java
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Posted by "Robert Burrell Donkin (JIRA)" <se...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615279#action_12615279 ] 

Robert Burrell Donkin commented on MIME4J-62:
---------------------------------------------

IMHO The correct behaviour is to use an encoder suitable for text for text attachments. IIRC the binary one uses the algorithm recommended for binary representations.

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Posted by "Robert Burrell Donkin (JIRA)" <se...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615250#action_12615250 ] 

Robert Burrell Donkin commented on MIME4J-62:
---------------------------------------------

IIRC I created them from scratch aiming for clarity and correctness rather than optimising for space

I don't there's anything to be gained by copying alternative implementations. I don't think the complexity of these algorithms justifies add more dependencies. 

the use of private inner classes intentionally hides the algorithm

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Posted by "Stefano Bagnara (JIRA)" <se...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615282#action_12615282 ] 

Stefano Bagnara commented on MIME4J-62:
---------------------------------------

I agree that the specific encoder for the text parts is the preferred behaviour.
BTW, The 3 failing tests represents a text/plain part, so they will be affected by a similar change, anyway.

About the CRLF problem I would add an issue but I still have to better investigate on it.
We anyway won't forget about it because we have the failing tests, so who takes care of updating the failing tests/expected result for this can also leave the CRLF failure there or add a JIRA issue depending on his preferences.

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Posted by "Niklas Therning (JIRA)" <se...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615416#action_12615416 ] 

Niklas Therning commented on MIME4J-62:
---------------------------------------

Great work Robert! It seems to be working fine. I've committed the code which removes the extra CRLFs. The 3 tests pass now. I'll close this issue now.

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: TextAttachmentEncodingTest.java
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Posted by "Robert Burrell Donkin (JIRA)" <se...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615286#action_12615286 ] 

Robert Burrell Donkin commented on MIME4J-62:
---------------------------------------------

Niklas - I'm in general agreement

Adding an encodeQuotedPrintable() method with good support for text sounds like it's the right approach. Since the failing tests accurately test the proper behaviour let's comment them out for the moment.

Robert

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Posted by "Stefano Bagnara (JIRA)" <se...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615252#action_12615252 ] 

Stefano Bagnara commented on MIME4J-62:
---------------------------------------

Robert, the escaping of SPACE and  TABS everywhere simply break 3 tests in MessageWriteToTest.
So you are free to decide whether the expected output for the tests is bad or if the code has to be improved.

About using commons-net classes, maybe we could have copy&pasted their sources: their code is probably more tested than code written from scratch. In fact we already found 3 issues in the QP outputstream. That's why I would probably get the code from commons-net. But I'm fine with whatever solution you propose.

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Posted by "Niklas Therning (JIRA)" <se...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615289#action_12615289 ] 

Niklas Therning commented on MIME4J-62:
---------------------------------------

Stefano, the extra CRLF you see come from TempFileTextBody.writeTo(). For the QP case I think it's incorrect for TempFileTextBody (TempFileBinaryBody also does this) to do that as that will actually alter the body. If e.g. a PNG is encoded like this it may not render properly when decoded.

If it's ok with you guys I'll change all SPACEs to _ for now and remove the CRLF at the end of the QP encoded part of the MULTIPART_WITH_BINARY_ATTACHMENTS test message. I'll also change TempFileTextBody and TempFileBinaryBody to not add the extra CRLF after outputting a QP encoded body. This will fix the failing tests. And then we can come back to this issue later to make an alternative encoder for textual data. How about that?

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: TextAttachmentEncodingTest.java
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Posted by "Robert Burrell Donkin (JIRA)" <se...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615402#action_12615402 ] 

Robert Burrell Donkin commented on MIME4J-62:
---------------------------------------------

I've finished working in this area for now. I think there's just the CRLF issue which needs a resolution.

Robert

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: TextAttachmentEncodingTest.java
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org