You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mime4j-dev@james.apache.org by "Robert Burrell Donkin (JIRA)" <mi...@james.apache.org> on 2009/02/06 22:04:59 UTC

[jira] Created: (MIME4J-112) Define Limits Of Round Tripping In Mime4J

Define Limits Of Round Tripping In Mime4J
-----------------------------------------

                 Key: MIME4J-112
                 URL: https://issues.apache.org/jira/browse/MIME4J-112
             Project: JAMES Mime4j
          Issue Type: Task
    Affects Versions: 0.6
            Reporter: Robert Burrell Donkin
             Fix For: 0.7


By round tripping, I mean parsing some MIME document into a fully decomposed form and then recreating a new version of the document from this form. 

In theory, Mime4J decomposition and recomposition could be made perfect with no loss of information. In other words, given a MIME document, the parser could completely decompose the document and a bitwise identical copy could be recomposed.

In practice, the limits of support are questionable. Some limitations may be expedient. For example, perhaps comments and encoding of ASCII characters are not sufficiently important to be worth preserving. Other limitations may arise from MIME documents which are not strictly compliant with the specification - for example, the use of unescaped non-ASCII characters in MIME headers may mean that the output would need to be escaped to ensure compliance.

It is important to define and describe the limits of round tripping so that users and developers are clear about the level of support MIme4J claims. In addition, sufficient unit tests should be created to ensure in confidence that  documents within these limits are correctly handled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MIME4J-112) Define Limits Of Round Tripping In Mime4J

Posted by "Markus Wiederkehr (JIRA)" <mi...@james.apache.org>.
    [ https://issues.apache.org/jira/browse/MIME4J-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671454#action_12671454 ] 

Markus Wiederkehr commented on MIME4J-112:
------------------------------------------

> 1. Preservation of comment data after parsing fields

This should not be a problem since every Field stores the original raw field string. The raw field string is used when writing the message. The only information lost is the kind of line delimiter that follows the field but this could easily be preserved, too.

> Another difficulty for unlimited round tripping (without preserving the original bits) is how to record the header wrapping for unconventional wrapping schemes. For example, a message may choose to wrap header values early but this information is lost during parsing.

It is not - see above.

> 2. Preservation of information about character encoding in headers

The field string is built by AbstractEntity using ByteArrayBuffer and CharArrayBuffer. The CharArrayBuffer uses the following code for converting an input byte into a character: 

            int ch = b[i1]; 
            if (ch < 0) {
                ch = 256 + ch;
            }

It might not be obvious but this is ISO-8859-1 conversion (because unicode code points 0000 to 00FF correspond directly to ISO-8859-1 byte codes 00 to FF).

So we would only have to use Latin 1 for writing the header fields..

> 3. Ability to build mail which does comply with the specifications

Unclear to me; what specification are you referring to and how is this related to round tripping?

> My feeling is that - given the availability of standard meta-data+document representations - Mime4J should support only limited round tripping for mail building representations.

I don't agree because I think that perfect round tripping might be a prerequisite for S/MIME canonicalization (MIME4J-113). Canonicalization is useless if bits of the original content have already been lost.

>From my point of view Mime4j also has to preserve to the original transfer encodings. Quoted-printable (even base64) cannot be re-encoded the same way it was. This might become nasty with inner encodings, for example a message might contain another message that is transfer encoded entirely. Mime4j would have to parse that inner message only on demand.

Preserving the original transfer encodings clearly causes some overhead and should be optional IMO..

I think there is not much else to it. The kind of line delimiters between header and body maybe..

> Define Limits Of Round Tripping In Mime4J
> -----------------------------------------
>
>                 Key: MIME4J-112
>                 URL: https://issues.apache.org/jira/browse/MIME4J-112
>             Project: JAMES Mime4j
>          Issue Type: Task
>    Affects Versions: 0.6
>            Reporter: Robert Burrell Donkin
>             Fix For: 0.7
>
>
> By round tripping, I mean parsing some MIME document into a fully decomposed form and then recreating a new version of the document from this form. 
> In theory, Mime4J decomposition and recomposition could be made perfect with no loss of information. In other words, given a MIME document, the parser could completely decompose the document and a bitwise identical copy could be recomposed.
> In practice, the limits of support are questionable. Some limitations may be expedient. For example, perhaps comments and encoding of ASCII characters are not sufficiently important to be worth preserving. Other limitations may arise from MIME documents which are not strictly compliant with the specification - for example, the use of unescaped non-ASCII characters in MIME headers may mean that the output would need to be escaped to ensure compliance.
> It is important to define and describe the limits of round tripping so that users and developers are clear about the level of support MIme4J claims. In addition, sufficient unit tests should be created to ensure in confidence that  documents within these limits are correctly handled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MIME4J-112) Define Limits Of Round Tripping In Mime4J

Posted by "Robert Burrell Donkin (JIRA)" <mi...@james.apache.org>.
    [ https://issues.apache.org/jira/browse/MIME4J-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671418#action_12671418 ] 

Robert Burrell Donkin commented on MIME4J-112:
----------------------------------------------

 "If the input message has been generated by mime4j then the round tripping have to be bitwise identical." 

is (for me) unlimited support for round tripping.

I think that it is an open question whether it is worthwhile Mime4J supporting unlimited round tripping from meta-data. The best way to preserve a bitwise identical message is to preserve the bits. (This is the approach suggested by Jukka and Noel, and is the one that IMAP uses.) One of my personal goals for Mime4J is to work on standard meta-data+document representations with persistent storage (based on use cases in james). By preserving the original bits, this approach would allow unlimited round tripping but at the cost of additional memory usage.

A commitment to supporting unlimited round tripping (without preserving the original bits) would require some tradeoffs in terms of code complexity and performance. Here are a few examples:

1. Preservation of comment data after parsing fields
2. Preservation of information about character encoding in headers 
3. Ability to build mail which does comply with the specifications

My feeling is that - given the availability of standard meta-data+document representations - Mime4J should support only limited round tripping for mail building representations. 

> Define Limits Of Round Tripping In Mime4J
> -----------------------------------------
>
>                 Key: MIME4J-112
>                 URL: https://issues.apache.org/jira/browse/MIME4J-112
>             Project: JAMES Mime4j
>          Issue Type: Task
>    Affects Versions: 0.6
>            Reporter: Robert Burrell Donkin
>             Fix For: 0.7
>
>
> By round tripping, I mean parsing some MIME document into a fully decomposed form and then recreating a new version of the document from this form. 
> In theory, Mime4J decomposition and recomposition could be made perfect with no loss of information. In other words, given a MIME document, the parser could completely decompose the document and a bitwise identical copy could be recomposed.
> In practice, the limits of support are questionable. Some limitations may be expedient. For example, perhaps comments and encoding of ASCII characters are not sufficiently important to be worth preserving. Other limitations may arise from MIME documents which are not strictly compliant with the specification - for example, the use of unescaped non-ASCII characters in MIME headers may mean that the output would need to be escaped to ensure compliance.
> It is important to define and describe the limits of round tripping so that users and developers are clear about the level of support MIme4J claims. In addition, sufficient unit tests should be created to ensure in confidence that  documents within these limits are correctly handled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MIME4J-112) Define Limits Of Round Tripping In Mime4J

Posted by "Robert Burrell Donkin (JIRA)" <mi...@james.apache.org>.
    [ https://issues.apache.org/jira/browse/MIME4J-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671437#action_12671437 ] 

Robert Burrell Donkin commented on MIME4J-112:
----------------------------------------------

Another difficulty for unlimited round tripping (without preserving the original bits) is how to record the header wrapping for unconventional wrapping schemes. For example, a message may choose to wrap header values early but this information is lost during parsing.

> Define Limits Of Round Tripping In Mime4J
> -----------------------------------------
>
>                 Key: MIME4J-112
>                 URL: https://issues.apache.org/jira/browse/MIME4J-112
>             Project: JAMES Mime4j
>          Issue Type: Task
>    Affects Versions: 0.6
>            Reporter: Robert Burrell Donkin
>             Fix For: 0.7
>
>
> By round tripping, I mean parsing some MIME document into a fully decomposed form and then recreating a new version of the document from this form. 
> In theory, Mime4J decomposition and recomposition could be made perfect with no loss of information. In other words, given a MIME document, the parser could completely decompose the document and a bitwise identical copy could be recomposed.
> In practice, the limits of support are questionable. Some limitations may be expedient. For example, perhaps comments and encoding of ASCII characters are not sufficiently important to be worth preserving. Other limitations may arise from MIME documents which are not strictly compliant with the specification - for example, the use of unescaped non-ASCII characters in MIME headers may mean that the output would need to be escaped to ensure compliance.
> It is important to define and describe the limits of round tripping so that users and developers are clear about the level of support MIme4J claims. In addition, sufficient unit tests should be created to ensure in confidence that  documents within these limits are correctly handled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MIME4J-112) Define Limits Of Round Tripping In Mime4J

Posted by "Stefano Bagnara (JIRA)" <mi...@james.apache.org>.
    [ https://issues.apache.org/jira/browse/MIME4J-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671347#action_12671347 ] 

Stefano Bagnara commented on MIME4J-112:
----------------------------------------

I think that at least this should be satisfied:
"If the input message has been generated by mime4j then the round tripping have to be bitwise identical."
(maybe some english native can reword it, I hope it is understandable)

IIRC I added a testsuite to check that each "expected result" in our testsuite is identical to the result of its parsing and writing in output again.


> Define Limits Of Round Tripping In Mime4J
> -----------------------------------------
>
>                 Key: MIME4J-112
>                 URL: https://issues.apache.org/jira/browse/MIME4J-112
>             Project: JAMES Mime4j
>          Issue Type: Task
>    Affects Versions: 0.6
>            Reporter: Robert Burrell Donkin
>             Fix For: 0.7
>
>
> By round tripping, I mean parsing some MIME document into a fully decomposed form and then recreating a new version of the document from this form. 
> In theory, Mime4J decomposition and recomposition could be made perfect with no loss of information. In other words, given a MIME document, the parser could completely decompose the document and a bitwise identical copy could be recomposed.
> In practice, the limits of support are questionable. Some limitations may be expedient. For example, perhaps comments and encoding of ASCII characters are not sufficiently important to be worth preserving. Other limitations may arise from MIME documents which are not strictly compliant with the specification - for example, the use of unescaped non-ASCII characters in MIME headers may mean that the output would need to be escaped to ensure compliance.
> It is important to define and describe the limits of round tripping so that users and developers are clear about the level of support MIme4J claims. In addition, sufficient unit tests should be created to ensure in confidence that  documents within these limits are correctly handled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MIME4J-112) Define Limits Of Round Tripping In Mime4J

Posted by "Stefano Bagnara (JIRA)" <mi...@james.apache.org>.
    [ https://issues.apache.org/jira/browse/MIME4J-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671462#action_12671462 ] 

Stefano Bagnara commented on MIME4J-112:
----------------------------------------

@Robert: I guess my explanation was not clear, because your concern are not valid in my "requirement".
"1. Preservation of comment data after parsing fields"
Does mime4j write comments in output? if so then it must be able to parse them. if it parses a comment written by itself and then write again it to output how can this be different from the previous one? Can you make an example? (i mean, if mime4j loses comments, then they will be loose at the first pass and we can ignore them for my "internal roundtrippin" requirement.

"2. Preservation of information about character encoding in headers"
Again, if mime4j alters the encoding during the write I would expect it to always use the same encoding in the same scenario. What would break my requirement is something like an "alternate" behaviour: let's say mime4j convert qp in base64 and base64 in qp at every parse=>writeout execution this would result in a A->B->A->B->A->B behaviour and this would be unacceptable. but A->B->B->B and B->A->A->A->A are both acceptable to me.

"3. Ability to build mail which does comply with the specifications"
Compliance is important, but it is unrelated to roundtripping, IMO.
I would expect mime4j to write compliant message in output and to be able to parse them. In any case mime4j is writing a malformed message in output I want it to be able to parse it again and a subsequent write to stream should result in the same malformed message.

> Define Limits Of Round Tripping In Mime4J
> -----------------------------------------
>
>                 Key: MIME4J-112
>                 URL: https://issues.apache.org/jira/browse/MIME4J-112
>             Project: JAMES Mime4j
>          Issue Type: Task
>    Affects Versions: 0.6
>            Reporter: Robert Burrell Donkin
>             Fix For: 0.7
>
>
> By round tripping, I mean parsing some MIME document into a fully decomposed form and then recreating a new version of the document from this form. 
> In theory, Mime4J decomposition and recomposition could be made perfect with no loss of information. In other words, given a MIME document, the parser could completely decompose the document and a bitwise identical copy could be recomposed.
> In practice, the limits of support are questionable. Some limitations may be expedient. For example, perhaps comments and encoding of ASCII characters are not sufficiently important to be worth preserving. Other limitations may arise from MIME documents which are not strictly compliant with the specification - for example, the use of unescaped non-ASCII characters in MIME headers may mean that the output would need to be escaped to ensure compliance.
> It is important to define and describe the limits of round tripping so that users and developers are clear about the level of support MIme4J claims. In addition, sufficient unit tests should be created to ensure in confidence that  documents within these limits are correctly handled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.