You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mime4j-dev@james.apache.org by "Oleg Kalnichevski (JIRA)" <mi...@james.apache.org> on 2009/02/16 13:26:59 UTC

[jira] Created: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

MIME stream parser handles non-ASCII fields incorrectly
-------------------------------------------------------

                 Key: MIME4J-118
                 URL: https://issues.apache.org/jira/browse/MIME4J-118
             Project: JAMES Mime4j
          Issue Type: Bug
            Reporter: Oleg Kalnichevski
             Fix For: 0.6


Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.

Oleg

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by Markus Wiederkehr <ma...@gmail.com>.

On Tue, Feb 24, 2009 at 10:23 PM, Robert Burrell Donkin
<ro...@gmail.com> wrote:
> <snip>
>
> i worry about the quantity of copying and new buffers that will need
> to be created to store a single complex, large document when every
> component has to be stored as a string and also as bytes to ensure
> round tripping in non-compliant corner cases.

Well at least I am confident that Mime4j does not perform worse than
it did before. Field always held the raw data. Only not it's a
ByteSequence instead of a String. Both are immutable and are not
copied because of that.

Markus

Re: [jira] Commented: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by Stefano Bagnara <ap...@bago.org>.

Robert Burrell Donkin ha scritto:
> [...]
> IIRC in a multipart document, the mime headers must be encoded in
> ASCII. so, the first level headers can all be access through byte
> offsets. a part may contain a transfer encoded document. there are a
> couple of distinct cases which are interesting: when the document is
> an embedded message or an embedded multipart document. when this is
> encoded in Base64 then a bytewise offset is not available in the
> original stream but is from the decoded stream. so, the bytewise
> offset in the decoding stream can be used. this is a rare use case and
> though the approach would be slow in this case, it would be a rare
> one.

FYI
http://issues.apache.org/jira/browse/MIME4J-114?focusedCommentId=12671463#action_12671463
-----
the content-transfer-encoding for a multipart message SHOULD always be
7bit, 8bit or binary to avoid nested encoding/decoding operations.

Javamail from 1.4 ignores a content-transfer-encoding quoted-printable
or base64 for a multipart message by default while previous javamail
versions parsed correctly nested encoding. Javamail 1.4 provides a flag
to enable the nested encodings (for backward compatibility):
mail.mime.ignoremultipartencoding
-----

Stefano

Re: [jira] Commented: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by Robert Burrell Donkin <ro...@gmail.com>.

On Tue, Feb 24, 2009 at 7:59 PM, Markus Wiederkehr
<ma...@gmail.com> wrote:
> On Tue, Feb 24, 2009 at 2:46 PM, Robert Burrell Donkin (JIRA)
> <mi...@james.apache.org> wrote:
>>
>>    [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676270#action_12676270 ]
>>
>> Robert Burrell Donkin commented on MIME4J-118:
>> ----------------------------------------------
>>
>> I suspect that there may be longer term issues with this general approach but i think we should accept that the current proposal is good enough for this release. release early, release often.
>
> +1 on the release part but I need a few days to clean up that patch.

fine

>> I think that the best way to approach is to preserve the original document together with boundary meta-data. In other words, that a 'Content-Type' header starts at byte 99 in the document rather than trying to slice up the document and re-assemble from lots of small byte buffers. But this is related to other issues which should wait until after this release so I think we should patch and look to ship.
>
> We can cross that bridge when we come to it but I don't particularly
> like the idea of having to open a file, seek to position 99 and read
> 50 bytes just to obtain the raw value of a Content-Type field, for
> example.

nio manages this quite adequately ;-)

i worry about the quantity of copying and new buffers that will need
to be created to store a single complex, large document when every
component has to be stored as a string and also as bytes to ensure
round tripping in non-compliant corner cases. i would much rather
encourage users to retain the original when absolute fidelity is
required.

> Also please mind that Field instances may be shared between multiple
> messages and they can be created from a constructor or factory without
> an original document to back them up.

the difficult problems with round tripping should not occur when
fields are created programmatically

> And last but not least with nested encodings there is no meaningful
> offset into a file..

i'm not sure i agree with that

IIRC in a multipart document, the mime headers must be encoded in
ASCII. so, the first level headers can all be access through byte
offsets. a part may contain a transfer encoded document. there are a
couple of distinct cases which are interesting: when the document is
an embedded message or an embedded multipart document. when this is
encoded in Base64 then a bytewise offset is not available in the
original stream but is from the decoded stream. so, the bytewise
offset in the decoding stream can be used. this is a rare use case and
though the approach would be slow in this case, it would be a rare
one.

- robert

Re: [jira] Commented: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by Markus Wiederkehr <ma...@gmail.com>.

On Tue, Feb 24, 2009 at 2:46 PM, Robert Burrell Donkin (JIRA)
<mi...@james.apache.org> wrote:
>
>    [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676270#action_12676270 ]
>
> Robert Burrell Donkin commented on MIME4J-118:
> ----------------------------------------------
>
> I suspect that there may be longer term issues with this general approach but i think we should accept that the current proposal is good enough for this release. release early, release often.

+1 on the release part but I need a few days to clean up that patch.

> I think that the best way to approach is to preserve the original document together with boundary meta-data. In other words, that a 'Content-Type' header starts at byte 99 in the document rather than trying to slice up the document and re-assemble from lots of small byte buffers. But this is related to other issues which should wait until after this release so I think we should patch and look to ship.

We can cross that bridge when we come to it but I don't particularly
like the idea of having to open a file, seek to position 99 and read
50 bytes just to obtain the raw value of a Content-Type field, for
example.

Also please mind that Field instances may be shared between multiple
messages and they can be created from a constructor or factory without
an original document to back them up.

And last but not least with nested encodings there is no meaningful
offset into a file..

Markus


>> MIME stream parser handles non-ASCII fields incorrectly
>> -------------------------------------------------------
>>
>>                 Key: MIME4J-118
>>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>>             Project: JAMES Mime4j
>>          Issue Type: Bug
>>            Reporter: Oleg Kalnichevski
>>            Assignee: Oleg Kalnichevski
>>             Fix For: 0.6
>>
>>         Attachments: mime4j-118-bytesequence-draft.patch, mime4j-118-field.patch, mimej4-118.patch
>>
>>
>> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
>> Oleg
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Re: [jira] Updated: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by Oleg Kalnichevski <ol...@apache.org>.

On Fri, 2009-02-20 at 13:06 +0100, Markus Wiederkehr wrote:
> On Thu, Feb 19, 2009 at 6:06 PM, Oleg Kalnichevski (JIRA)
> <mi...@james.apache.org> wrote:
> >
> >     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
> >
> > Oleg Kalnichevski updated MIME4J-118:
> > -------------------------------------
> >
> >    Attachment: mime4j-118-field.patch
> >
> > Markus,
> >
> > Let's try a different approach to the problem. The new patch changes representation of a MIME field in the API by replacing name/value/raw tuple with Field interface. If you like this patch better, I'll look into changing the representation of raw field content from String to ByteArrayBuffer or similar immutable class. As a next step I would look into resolving MIME4J-116.
> 
> Hi Oleg,
> 
> sorry for the late response, I was a bit busy the last few days..
> 

No problem. It is good to be busy these days.


> The Field interface patch looks good to me, feel free to commit. Your
> idea of providing an immutable type that can internally be cast to
> something else for performance optimization sounds good as well.
> 
> One small thing: the Javadoc of Field describes the body as "unfolded
> unparsed field body string". I'm not sure if this is correct. Do you
> mean the field has already been unfolded by the parser at this point?
> 

I must confess I did not really do a good job on updating the javadocs.
The above statement is certainly not correct. I'll fix it before I
commit.

Cheers

Oleg


> Markus
> 
> 
> >
> > Oleg
> >
> >> MIME stream parser handles non-ASCII fields incorrectly
> >> -------------------------------------------------------
> >>
> >>                 Key: MIME4J-118
> >>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
> >>             Project: JAMES Mime4j
> >>          Issue Type: Bug
> >>            Reporter: Oleg Kalnichevski
> >>            Assignee: Oleg Kalnichevski
> >>             Fix For: 0.6
> >>
> >>         Attachments: mime4j-118-field.patch, mime4j-118.patch
> >>
> >>
> >> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
> >> Oleg
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> >
> >
> 
> 
>

Re: [jira] Updated: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by Markus Wiederkehr <ma...@gmail.com>.

On Thu, Feb 19, 2009 at 6:06 PM, Oleg Kalnichevski (JIRA)
<mi...@james.apache.org> wrote:
>
>     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Oleg Kalnichevski updated MIME4J-118:
> -------------------------------------
>
>    Attachment: mime4j-118-field.patch
>
> Markus,
>
> Let's try a different approach to the problem. The new patch changes representation of a MIME field in the API by replacing name/value/raw tuple with Field interface. If you like this patch better, I'll look into changing the representation of raw field content from String to ByteArrayBuffer or similar immutable class. As a next step I would look into resolving MIME4J-116.

Hi Oleg,

sorry for the late response, I was a bit busy the last few days..

The Field interface patch looks good to me, feel free to commit. Your
idea of providing an immutable type that can internally be cast to
something else for performance optimization sounds good as well.

One small thing: the Javadoc of Field describes the body as "unfolded
unparsed field body string". I'm not sure if this is correct. Do you
mean the field has already been unfolded by the parser at this point?

Markus


>
> Oleg
>
>> MIME stream parser handles non-ASCII fields incorrectly
>> -------------------------------------------------------
>>
>>                 Key: MIME4J-118
>>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>>             Project: JAMES Mime4j
>>          Issue Type: Bug
>>            Reporter: Oleg Kalnichevski
>>            Assignee: Oleg Kalnichevski
>>             Fix For: 0.6
>>
>>         Attachments: mime4j-118-field.patch, mime4j-118.patch
>>
>>
>> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
>> Oleg
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>



-- 
Always remember you're unique. Just like everyone else.

Re: [jira] Assigned: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by Oleg Kalnichevski <ol...@apache.org>.

Markus Wiederkehr wrote:
> In my opinion this issue is closely related to MIME4J-112 and MIME4J-116.
> 
> I think that in the course of MIME4J-116 we should (maybe) create
> Field instances in AbstractEntity instead of later on in
> MessageBuilder. A Field object could store the raw data in a byte[]
> instead of a String which would greatly help with MIME4J-112.
> 

I would very much rather prefer to not couple MIME entity classes with 
Field, if possible.

> The only problem is that the charset for a lenient parsing mode is not
> known at this early point. But considering your clarification about
> the lenient writing mode I wonder if anybody really needs a lenient
> parsing mode. (I wonder if anyone really needs a lenient writing mode
> for that matter.)
> 
> So maybe AbstractEntity should simply use US-ASCII to decode the
> header fields without direct support for a lenient parsing mode that
> nobody needs. Then AbstractEntity can build Field instances and a
> ContentHandler receives those Field instances without having to parse
> them again.
> 
> All in all I'm not sure if #118 should be addressed independently of
> 112 and 116 and whether 118 should be targeted for 0.6..
> 

I personally dislike 'big-bang' style refactoring and prefer smaller 
incremental changes when lower level components get fixed first and 
remaining issues get sort of 'pushed' upwards to the higher level 
components.

I'll have a patch ready by tomorrow noon. If it gets rejected, let us 
revisit the idea of fixing #118, #112 and #116 all at the same time.

Cheers

Oleg

> But those are just my 2 cents,
> 
> Markus
> 
> 
> On Mon, Feb 16, 2009 at 1:27 PM, Oleg Kalnichevski (JIRA)
> <mi...@james.apache.org> wrote:
>>     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>>
>> Oleg Kalnichevski reassigned MIME4J-118:
>> ----------------------------------------
>>
>>    Assignee: oleg.kalnichevski
>>
>> Working on a patch
>>
>> Oleg
>>
>>> MIME stream parser handles non-ASCII fields incorrectly
>>> -------------------------------------------------------
>>>
>>>                 Key: MIME4J-118
>>>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>>>             Project: JAMES Mime4j
>>>          Issue Type: Bug
>>>            Reporter: Oleg Kalnichevski
>>>            Assignee: oleg.kalnichevski
>>>             Fix For: 0.6
>>>
>>>
>>> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
>>> Oleg
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>

Re: [jira] Assigned: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by Markus Wiederkehr <ma...@gmail.com>.

On Mon, Feb 16, 2009 at 2:49 PM, Stefano Bagnara <ap...@bago.org> wrote:
> Markus Wiederkehr ha scritto:
>> In my opinion this issue is closely related to MIME4J-112 and MIME4J-116.
>>
>> I think that in the course of MIME4J-116 we should (maybe) create
>> Field instances in AbstractEntity instead of later on in
>> MessageBuilder. A Field object could store the raw data in a byte[]
>> instead of a String which would greatly help with MIME4J-112.
>>
>> The only problem is that the charset for a lenient parsing mode is not
>> known at this early point. But considering your clarification about
>> the lenient writing mode I wonder if anybody really needs a lenient
>> parsing mode. (I wonder if anyone really needs a lenient writing mode
>> for that matter.)
>
> Lenient Writing IMO is only needed if you need roundtrip. For
> standard/most MIME4J usages I don't see why we should write malformed
> data in output.

In my opinion Field should preserve the original bytes in a byte
array. Writing a message could simply use these original bytes and
there would be no roundtrip issues. Essentially there would be only
one writing mode.

In additional I would like to have a "visitor" or whatever that can be
used to tidy up a message.

> Lenient reading instead is part of  being a generic parsing library:
> most email clients correctly handle 8bit chars in the Subject header
> because it happens than some email client writes them unencoded. If you
> think mime4j could be used as the library for an email client it
> probably still worth handling 8bit chars in the headers.
> Of course there is no need to implement such a feature until someone
> really ask/need it.

My approach would still allow for that with a little overhead. If a
ContentHandler receives a Field and that field contains the original
raw bytes then nothing prevents the ContentHandler from parsing the
fields again; using any charset determined by whatever means. Also
structured fields are parsed lazily so the overhead would not be
tremendous.

> I don't really know nowadays how many email messages contains unencoded
> headers. 10 years ago, when I checked this stuff deeply almost 40% of
> international emails included unencoded headers. I expect this
> percentage to be much less today, but I don't know if it is 10% or 0.1%.
>
> Stefano
>
>> So maybe AbstractEntity should simply use US-ASCII to decode the
>> header fields without direct support for a lenient parsing mode that
>> nobody needs. Then AbstractEntity can build Field instances and a
>> ContentHandler receives those Field instances without having to parse
>> them again.
>>
>> All in all I'm not sure if #118 should be addressed independently of
>> 112 and 116 and whether 118 should be targeted for 0.6..
>>
>> But those are just my 2 cents,
>>
>> Markus
>>
>>
>> On Mon, Feb 16, 2009 at 1:27 PM, Oleg Kalnichevski (JIRA)
>> <mi...@james.apache.org> wrote:
>>>     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>>>
>>> Oleg Kalnichevski reassigned MIME4J-118:
>>> ----------------------------------------
>>>
>>>    Assignee: oleg.kalnichevski
>>>
>>> Working on a patch
>>>
>>> Oleg
>>>
>>>> MIME stream parser handles non-ASCII fields incorrectly
>>>> -------------------------------------------------------
>>>>
>>>>                 Key: MIME4J-118
>>>>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>>>>             Project: JAMES Mime4j
>>>>          Issue Type: Bug
>>>>            Reporter: Oleg Kalnichevski
>>>>            Assignee: oleg.kalnichevski
>>>>             Fix For: 0.6
>>>>
>>>>
>>>> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
>>>> Oleg
>>> --
>>> This message is automatically generated by JIRA.
>>> -
>>> You can reply to this email to add a comment to the issue online.
>>>

Re: [jira] Assigned: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by Stefano Bagnara <ap...@bago.org>.

Markus Wiederkehr ha scritto:
> In my opinion this issue is closely related to MIME4J-112 and MIME4J-116.
> 
> I think that in the course of MIME4J-116 we should (maybe) create
> Field instances in AbstractEntity instead of later on in
> MessageBuilder. A Field object could store the raw data in a byte[]
> instead of a String which would greatly help with MIME4J-112.
> 
> The only problem is that the charset for a lenient parsing mode is not
> known at this early point. But considering your clarification about
> the lenient writing mode I wonder if anybody really needs a lenient
> parsing mode. (I wonder if anyone really needs a lenient writing mode
> for that matter.)

Lenient Writing IMO is only needed if you need roundtrip. For
standard/most MIME4J usages I don't see why we should write malformed
data in output.

Lenient reading instead is part of  being a generic parsing library:
most email clients correctly handle 8bit chars in the Subject header
because it happens than some email client writes them unencoded. If you
think mime4j could be used as the library for an email client it
probably still worth handling 8bit chars in the headers.
Of course there is no need to implement such a feature until someone
really ask/need it.

I don't really know nowadays how many email messages contains unencoded
headers. 10 years ago, when I checked this stuff deeply almost 40% of
international emails included unencoded headers. I expect this
percentage to be much less today, but I don't know if it is 10% or 0.1%.

Stefano

> So maybe AbstractEntity should simply use US-ASCII to decode the
> header fields without direct support for a lenient parsing mode that
> nobody needs. Then AbstractEntity can build Field instances and a
> ContentHandler receives those Field instances without having to parse
> them again.
> 
> All in all I'm not sure if #118 should be addressed independently of
> 112 and 116 and whether 118 should be targeted for 0.6..
> 
> But those are just my 2 cents,
> 
> Markus
> 
> 
> On Mon, Feb 16, 2009 at 1:27 PM, Oleg Kalnichevski (JIRA)
> <mi...@james.apache.org> wrote:
>>     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>>
>> Oleg Kalnichevski reassigned MIME4J-118:
>> ----------------------------------------
>>
>>    Assignee: oleg.kalnichevski
>>
>> Working on a patch
>>
>> Oleg
>>
>>> MIME stream parser handles non-ASCII fields incorrectly
>>> -------------------------------------------------------
>>>
>>>                 Key: MIME4J-118
>>>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>>>             Project: JAMES Mime4j
>>>          Issue Type: Bug
>>>            Reporter: Oleg Kalnichevski
>>>            Assignee: oleg.kalnichevski
>>>             Fix For: 0.6
>>>
>>>
>>> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
>>> Oleg
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>

Re: [jira] Assigned: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by Markus Wiederkehr <ma...@gmail.com>.

In my opinion this issue is closely related to MIME4J-112 and MIME4J-116.

I think that in the course of MIME4J-116 we should (maybe) create
Field instances in AbstractEntity instead of later on in
MessageBuilder. A Field object could store the raw data in a byte[]
instead of a String which would greatly help with MIME4J-112.

The only problem is that the charset for a lenient parsing mode is not
known at this early point. But considering your clarification about
the lenient writing mode I wonder if anybody really needs a lenient
parsing mode. (I wonder if anyone really needs a lenient writing mode
for that matter.)

So maybe AbstractEntity should simply use US-ASCII to decode the
header fields without direct support for a lenient parsing mode that
nobody needs. Then AbstractEntity can build Field instances and a
ContentHandler receives those Field instances without having to parse
them again.

All in all I'm not sure if #118 should be addressed independently of
112 and 116 and whether 118 should be targeted for 0.6..

But those are just my 2 cents,

Markus

On Mon, Feb 16, 2009 at 1:27 PM, Oleg Kalnichevski (JIRA)
<mi...@james.apache.org> wrote:
>
>     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Oleg Kalnichevski reassigned MIME4J-118:
> ----------------------------------------
>
>    Assignee: oleg.kalnichevski
>
> Working on a patch
>
> Oleg
>
>> MIME stream parser handles non-ASCII fields incorrectly
>> -------------------------------------------------------
>>
>>                 Key: MIME4J-118
>>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>>             Project: JAMES Mime4j
>>          Issue Type: Bug
>>            Reporter: Oleg Kalnichevski
>>            Assignee: oleg.kalnichevski
>>             Fix For: 0.6
>>
>>
>> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
>> Oleg
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>

Re: [jira] Updated: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by Oleg Kalnichevski <ol...@apache.org>.

Markus Wiederkehr wrote:
> Let me reply to the mailing list to keep the discussion on the detail
> level away from the JIRA..
> 
> On Tue, Feb 17, 2009 at 12:29 PM, Oleg Kalnichevski (JIRA)
> <mi...@james.apache.org> wrote:
>>     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>>
>> Oleg Kalnichevski updated MIME4J-118:
>> -------------------------------------
>>
>>    Attachment: mime4j-118.patch
>>
>> Here is the first cut at fixing the problem with charset coding in MIME fields. The significant changes are
>>
>> (1) EntityStateMachine#getField() returns ByteArrayBuffer instead of a String
>> (2) ContentHandler#field takes ByteArrayBuffer instead of a String
> 
> Your patch does not look bad, but:
> 
> I have a feeling that it will not help with MIME4J-116, maybe on the
> contrary. You say you do not want AbstractEntity to create Field
> instances yet you give no rationalization for your opinion.. 

I believe I did, but I can certainly re-iterate.

Tight coupling is believed to be bad, I _personally_ tend to agree. By 
coupling EntityStateMachine with the Field class we will end up coupling 
it, albeit indirectly, with pretty much the entire 
org.apache.james.mime4j.field package

EntityStateMachine -> Field -> DefaultFieldParser -> several concrete 
implementations of FieldParser

This is a very _bad_ idea, in my opinion.

I could live with it, if Field were an interface.

I don't
> see how the ByteArrayBuffer can help with the unnecessary duplicate
> field parsing.
> 

It was not meant to. To me, these two issues are related, but not the same.


> I have no problem with small changes one at a time. But it does not
> hurt to keep an eye on related issues in the process.
> 
> Also I don't like that ByteArrayBuffer is a mutable class (and yes, I
> am aware that byte[] is mutable too). Please consider that Field is
> designed to be immutable. This is an important aspect because a Field
> may be shared between multiple messages. So if you refactor Field to
> store a ByteArrayBuffer please make sure that the ByteArrayBuffer does
> not get exposed publicly.
> 
> Maybe we should introduce an immutable object that holds a byte array;
> similar to what String is for a character array. That would still not
> help with #116 though..
> 

Yes, this is a good idea, but in this case the class would have to make 
a copy of the byte array _every_ time someone wanted to get access to 
the underlying raw content, for instance, in order to parse it using a 
different parser or write it out to an OutputStream. This is the only 
way I can see to ensure immutability of such a class. Interestingly 
enough, this can get quite expansive if occurs frequently.

We solved a similar issue in HttpCore in the following way.

We have Header interface that represents an immutable HTTP header and we 
have FormattedHeader interface extending Header that provides access to 
the underlying CharArrayBuffer. One would have to cast Header to 
FormattedHeader in order to be able to mutate its content.

Oleg

> Markus
> 
>> If these changes are okay with everyone I'll proceed with refactoring and change Field and friends to utilize ByteArrayBuffer for storing raw field content.
>>
>> Oleg
>>
>>
>>> MIME stream parser handles non-ASCII fields incorrectly
>>> -------------------------------------------------------
>>>
>>>                 Key: MIME4J-118
>>>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>>>             Project: JAMES Mime4j
>>>          Issue Type: Bug
>>>            Reporter: Oleg Kalnichevski
>>>            Assignee: Oleg Kalnichevski
>>>             Fix For: 0.6
>>>
>>>         Attachments: mime4j-118.patch
>>>
>>>
>>> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
>>> Oleg
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.

Re: [jira] Updated: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by Markus Wiederkehr <ma...@gmail.com>.

Let me reply to the mailing list to keep the discussion on the detail
level away from the JIRA..

On Tue, Feb 17, 2009 at 12:29 PM, Oleg Kalnichevski (JIRA)
<mi...@james.apache.org> wrote:
>
>     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Oleg Kalnichevski updated MIME4J-118:
> -------------------------------------
>
>    Attachment: mime4j-118.patch
>
> Here is the first cut at fixing the problem with charset coding in MIME fields. The significant changes are
>
> (1) EntityStateMachine#getField() returns ByteArrayBuffer instead of a String
> (2) ContentHandler#field takes ByteArrayBuffer instead of a String

Your patch does not look bad, but:

I have a feeling that it will not help with MIME4J-116, maybe on the
contrary. You say you do not want AbstractEntity to create Field
instances yet you give no rationalization for your opinion.. I don't
see how the ByteArrayBuffer can help with the unnecessary duplicate
field parsing.

I have no problem with small changes one at a time. But it does not
hurt to keep an eye on related issues in the process.

Also I don't like that ByteArrayBuffer is a mutable class (and yes, I
am aware that byte[] is mutable too). Please consider that Field is
designed to be immutable. This is an important aspect because a Field
may be shared between multiple messages. So if you refactor Field to
store a ByteArrayBuffer please make sure that the ByteArrayBuffer does
not get exposed publicly.

Maybe we should introduce an immutable object that holds a byte array;
similar to what String is for a character array. That would still not
help with #116 though..

Markus

> If these changes are okay with everyone I'll proceed with refactoring and change Field and friends to utilize ByteArrayBuffer for storing raw field content.
>
> Oleg
>
>
>> MIME stream parser handles non-ASCII fields incorrectly
>> -------------------------------------------------------
>>
>>                 Key: MIME4J-118
>>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>>             Project: JAMES Mime4j
>>          Issue Type: Bug
>>            Reporter: Oleg Kalnichevski
>>            Assignee: Oleg Kalnichevski
>>             Fix For: 0.6
>>
>>         Attachments: mime4j-118.patch
>>
>>
>> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
>> Oleg
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.

[jira] Updated: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by "Markus Wiederkehr (JIRA)" <mi...@james.apache.org>.

     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Wiederkehr updated MIME4J-118:
-------------------------------------

    Attachment: mime4j-118-bytesequence.patch

Here is the new patch, please review.

Note that support for different writing modes has been completely removed..

> MIME stream parser handles non-ASCII fields incorrectly
> -------------------------------------------------------
>
>                 Key: MIME4J-118
>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>             Project: JAMES Mime4j
>          Issue Type: Bug
>            Reporter: Oleg Kalnichevski
>            Assignee: Oleg Kalnichevski
>             Fix For: 0.6
>
>         Attachments: mime4j-118-bytesequence.patch, mime4j-118-field.patch, mimej4-118.patch
>
>
> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
> Oleg

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by "Oleg Kalnichevski (JIRA)" <mi...@james.apache.org>.

     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oleg Kalnichevski updated MIME4J-118:
-------------------------------------

    Attachment:     (was: mime4j-118.patch)

> MIME stream parser handles non-ASCII fields incorrectly
> -------------------------------------------------------
>
>                 Key: MIME4J-118
>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>             Project: JAMES Mime4j
>          Issue Type: Bug
>            Reporter: Oleg Kalnichevski
>            Assignee: Oleg Kalnichevski
>             Fix For: 0.6
>
>         Attachments: mime4j-118-field.patch
>
>
> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
> Oleg

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by "Markus Wiederkehr (JIRA)" <mi...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676087#action_12676087 ] 

Markus Wiederkehr commented on MIME4J-118:
------------------------------------------

As I have outlined before I would prefer a solution where Field exposes the original data as a sequence of bytes instead of as a String. This way MessageWriter could simply use that data as is.

Also it seems that the lenient parsing is not done correctly in DefaultBodyDescriptor because it only uses the Content-Type charset to decode header fields that come after Content-Type. Header field parsing would have to be deferred until Content-Type is encountered for this to work as expected.

Btw I'm with Stefano on this one; I think we don't have to implement something like that until someone asks for it. Also deferred header field parsing can always be done in a custom ContentHandler implementation (in endHeader()).

> MIME stream parser handles non-ASCII fields incorrectly
> -------------------------------------------------------
>
>                 Key: MIME4J-118
>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>             Project: JAMES Mime4j
>          Issue Type: Bug
>            Reporter: Oleg Kalnichevski
>            Assignee: Oleg Kalnichevski
>             Fix For: 0.6
>
>         Attachments: mime4j-118-field.patch, mimej4-118.patch
>
>
> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
> Oleg

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by "Oleg Kalnichevski (JIRA)" <mi...@james.apache.org>.

     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oleg Kalnichevski updated MIME4J-118:
-------------------------------------

    Attachment: mime4j-118.patch

Here is the first cut at fixing the problem with charset coding in MIME fields. The significant changes are

(1) EntityStateMachine#getField() returns ByteArrayBuffer instead of a String
(2) ContentHandler#field takes ByteArrayBuffer instead of a String

If these changes are okay with everyone I'll proceed with refactoring and change Field and friends to utilize ByteArrayBuffer for storing raw field content.

Oleg


> MIME stream parser handles non-ASCII fields incorrectly
> -------------------------------------------------------
>
>                 Key: MIME4J-118
>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>             Project: JAMES Mime4j
>          Issue Type: Bug
>            Reporter: Oleg Kalnichevski
>            Assignee: Oleg Kalnichevski
>             Fix For: 0.6
>
>         Attachments: mime4j-118.patch
>
>
> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
> Oleg

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by "Oleg Kalnichevski (JIRA)" <mi...@james.apache.org>.

     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oleg Kalnichevski updated MIME4J-118:
-------------------------------------

    Attachment: mimej4-118.patch

Change log:

* Field interface implementations store raw content as the original ByteArrayBuffer produced by the parser.
* AbstractField classes also retain the original charset of the parser. This enables reliable reconstruction of the original text
* RawFieldImpl parses body lazily, only if accessed

This patch should resolve the issue. Please review

Oleg

> MIME stream parser handles non-ASCII fields incorrectly
> -------------------------------------------------------
>
>                 Key: MIME4J-118
>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>             Project: JAMES Mime4j
>          Issue Type: Bug
>            Reporter: Oleg Kalnichevski
>            Assignee: Oleg Kalnichevski
>             Fix For: 0.6
>
>         Attachments: mime4j-118-field.patch, mimej4-118.patch
>
>
> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
> Oleg

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by "Markus Wiederkehr (JIRA)" <mi...@james.apache.org>.

     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Wiederkehr updated MIME4J-118:
-------------------------------------

    Attachment: mime4j-118-bytesequence-draft.patch

Oleg, here is a draft of a patch to show what I mean. It is partly based on your code.

Not all unit tests pass because with this approach the different writing modes no longer make sense.

Notice how ByteSequenceUtil checks if a ByteSequence is actually a ByteArrayBuffer for performance optimization like you suggested.

This patch does not include a lenient parsing mode. Please let me know if something like this is acceptable then I'll clean it up and add a few more details.

> MIME stream parser handles non-ASCII fields incorrectly
> -------------------------------------------------------
>
>                 Key: MIME4J-118
>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>             Project: JAMES Mime4j
>          Issue Type: Bug
>            Reporter: Oleg Kalnichevski
>            Assignee: Oleg Kalnichevski
>             Fix For: 0.6
>
>         Attachments: mime4j-118-bytesequence-draft.patch, mime4j-118-field.patch, mimej4-118.patch
>
>
> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
> Oleg

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by "Robert Burrell Donkin (JIRA)" <mi...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676270#action_12676270 ] 

Robert Burrell Donkin commented on MIME4J-118:
----------------------------------------------

I suspect that there may be longer term issues with this general approach but i think we should accept that the current proposal is good enough for this release. release early, release often.

I think that the best way to approach is to preserve the original document together with boundary meta-data. In other words, that a 'Content-Type' header starts at byte 99 in the document rather than trying to slice up the document and re-assemble from lots of small byte buffers. But this is related to other issues which should wait until after this release so I think we should patch and look to ship.

> MIME stream parser handles non-ASCII fields incorrectly
> -------------------------------------------------------
>
>                 Key: MIME4J-118
>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>             Project: JAMES Mime4j
>          Issue Type: Bug
>            Reporter: Oleg Kalnichevski
>            Assignee: Oleg Kalnichevski
>             Fix For: 0.6
>
>         Attachments: mime4j-118-bytesequence-draft.patch, mime4j-118-field.patch, mimej4-118.patch
>
>
> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
> Oleg

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by "Oleg Kalnichevski (JIRA)" <mi...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676097#action_12676097 ] 

Oleg Kalnichevski commented on MIME4J-118:
------------------------------------------

I took a cursory look at the patch without applying it and as far as I could tell everything looked all right. Feel free to go ahead with your approach.

Oleg

> MIME stream parser handles non-ASCII fields incorrectly
> -------------------------------------------------------
>
>                 Key: MIME4J-118
>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>             Project: JAMES Mime4j
>          Issue Type: Bug
>            Reporter: Oleg Kalnichevski
>            Assignee: Oleg Kalnichevski
>             Fix For: 0.6
>
>         Attachments: mime4j-118-bytesequence-draft.patch, mime4j-118-field.patch, mimej4-118.patch
>
>
> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
> Oleg

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by "Oleg Kalnichevski (JIRA)" <mi...@james.apache.org>.

    [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677124#action_12677124 ] 

Oleg Kalnichevski commented on MIME4J-118:
------------------------------------------

Looks ok to me. I think ByteSequence and EmptyByteSequence should go into o.a.j.mime4j.util, though

Oleg

> MIME stream parser handles non-ASCII fields incorrectly
> -------------------------------------------------------
>
>                 Key: MIME4J-118
>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>             Project: JAMES Mime4j
>          Issue Type: Bug
>            Reporter: Oleg Kalnichevski
>            Assignee: Oleg Kalnichevski
>             Fix For: 0.6
>
>         Attachments: mime4j-118-bytesequence.patch, mime4j-118-field.patch, mimej4-118.patch
>
>
> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
> Oleg

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by "Oleg Kalnichevski (JIRA)" <mi...@james.apache.org>.

     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oleg Kalnichevski reassigned MIME4J-118:
----------------------------------------

    Assignee: oleg.kalnichevski

Working on a patch

Oleg

> MIME stream parser handles non-ASCII fields incorrectly
> -------------------------------------------------------
>
>                 Key: MIME4J-118
>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>             Project: JAMES Mime4j
>          Issue Type: Bug
>            Reporter: Oleg Kalnichevski
>            Assignee: oleg.kalnichevski
>             Fix For: 0.6
>
>
> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
> Oleg

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by "Markus Wiederkehr (JIRA)" <mi...@james.apache.org>.

     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Wiederkehr updated MIME4J-118:
-------------------------------------

    Attachment:     (was: mime4j-118-bytesequence-draft.patch)

> MIME stream parser handles non-ASCII fields incorrectly
> -------------------------------------------------------
>
>                 Key: MIME4J-118
>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>             Project: JAMES Mime4j
>          Issue Type: Bug
>            Reporter: Oleg Kalnichevski
>            Assignee: Oleg Kalnichevski
>             Fix For: 0.6
>
>         Attachments: mime4j-118-field.patch, mimej4-118.patch
>
>
> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
> Oleg

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by "Oleg Kalnichevski (JIRA)" <mi...@james.apache.org>.

     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oleg Kalnichevski updated MIME4J-118:
-------------------------------------

    Attachment: mime4j-118-field.patch

Markus,

Let's try a different approach to the problem. The new patch changes representation of a MIME field in the API by replacing name/value/raw tuple with Field interface. If you like this patch better, I'll look into changing the representation of raw field content from String to ByteArrayBuffer or similar immutable class. As a next step I would look into resolving MIME4J-116.

Oleg

> MIME stream parser handles non-ASCII fields incorrectly
> -------------------------------------------------------
>
>                 Key: MIME4J-118
>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>             Project: JAMES Mime4j
>          Issue Type: Bug
>            Reporter: Oleg Kalnichevski
>            Assignee: Oleg Kalnichevski
>             Fix For: 0.6
>
>         Attachments: mime4j-118-field.patch, mime4j-118.patch
>
>
> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
> Oleg

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by "Markus Wiederkehr (JIRA)" <mi...@james.apache.org>.

     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Wiederkehr resolved MIME4J-118.
--------------------------------------

    Resolution: Fixed
      Assignee: Markus Wiederkehr  (was: Oleg Kalnichevski)

> MIME stream parser handles non-ASCII fields incorrectly
> -------------------------------------------------------
>
>                 Key: MIME4J-118
>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>             Project: JAMES Mime4j
>          Issue Type: Bug
>            Reporter: Oleg Kalnichevski
>            Assignee: Markus Wiederkehr
>             Fix For: 0.6
>
>         Attachments: mime4j-118-bytesequence.patch, mime4j-118-field.patch, mimej4-118.patch
>
>
> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
> Oleg

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Posted by "Oleg Kalnichevski (JIRA)" <mi...@james.apache.org>.

     [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oleg Kalnichevski closed MIME4J-118.
------------------------------------


Reviewed.

Oleg

> MIME stream parser handles non-ASCII fields incorrectly
> -------------------------------------------------------
>
>                 Key: MIME4J-118
>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>             Project: JAMES Mime4j
>          Issue Type: Bug
>            Reporter: Oleg Kalnichevski
>            Assignee: Markus Wiederkehr
>             Fix For: 0.6
>
>         Attachments: mime4j-118-bytesequence.patch, mime4j-118-field.patch, mimej4-118.patch
>
>
> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary field content gets converted to its textual representation too early in the parsing process using simple byte to char cast. The decision about appropriate char encoding should be left up to individual ContentHandler implementations.
> Oleg

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.