You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mime4j-dev@james.apache.org by Lukáš Vlček <lu...@gmail.com> on 2011/12/07 17:00:02 UTC

Is it possible to have this mail parsed correctly?

Hi,

The following is a eml source of a short mail:
https://gist.github.com/5a9b383c1dc048fac6d4

The following is a link to public (Mailman) pipermail rendered
representation of the same mail:
http://lists.jboss.org/pipermail/jboss-cluster-dev/2008-April/000000.html

Note how the sign in the footer of the email contains name "Zamarreño".

When using mime4j I am getting "ZamarreÃ±o" instead (tested with both 0.6
and 0.7.1).

Is mime4j able to parse this mail the same way as Mailman (2.1.9) can do it?

Regards,
Lukas

Re: Is it possible to have this mail parsed correctly?

Posted by Oleg Kalnichevski <ol...@apache.org>.

On Mon, 2011-12-12 at 10:52 +0100, Lukáš Vlček wrote:
> Hi Stefano,
> 
> Thanks for the analysis. I extracted this use case to the following test:
> https://github.com/lukas-vlcek/mime4j-test/blob/master/src/test/java/org/mime4j/test/BasicTest.java#L45
> 
> Now, the question is, why Mailman is able to render the output correctly if
> the charset and used encoding in the body are not in sync. May be the
> encoding of the message file has been changed when I copied the file from
> the server to my local dev machine... or it is just coincidence? I do not
> know... just thinking out loud...
> 
> Regards,
> Lukas
> 

This is the hex dump of the message which suggests the message body
content is utf-8 coded, while the content-type header declares
ISO-8859-1 as the content charset. 

00000860   6E 79 6F 6E  65 20 74 68  65 72 65 3F  20 3A 29 0A  0A 2D 2D
0A  47 61 6C 64  65 72 20 5A  61 6D 61 72  nyone there? :)..--.Galder
Zamar
00000880   72 65 C3 B1  6F 0A 53 72  2E 20 53 6F  66 74 77 61  72 65 20
4D  61 69 6E 74  65 6E 61 6E  63 65 20 45  re..o.Sr. Software
Maintenance E

It can be that the message got modified while copied, or it can be that
Mailman employs some sort of content type / charset detection mechanism.
In any case mime4j correctly decoded the message based on its metadata.

Oleg 


> On Fri, Dec 9, 2011 at 4:35 PM, Stefano Bagnara <ap...@bago.org> wrote:
> 
> > 2011/12/7 Lukáš Vlček <lu...@gmail.com>:
> > > Hi,
> > >
> > > The following is a eml source of a short mail:
> > > https://gist.github.com/5a9b383c1dc048fac6d4
> > >
> > > The following is a link to public (Mailman) pipermail rendered
> > > representation of the same mail:
> > >
> > http://lists.jboss.org/pipermail/jboss-cluster-dev/2008-April/000000.html
> > >
> > > Note how the sign in the footer of the email contains name "Zamarreño".
> > >
> > > When using mime4j I am getting "ZamarreÃ±o" instead (tested with both 0.6
> > > and 0.7.1).
> > >
> > > Is mime4j able to parse this mail the same way as Mailman (2.1.9) can do
> > it?
> >
> > mime4j is doing the right thing.
> > The message declares the charset as ISO-8859-1 and then use an UTF8
> > sequence.
> > So if you really want to use ñ in an ISO-8859-1 message make sure you
> > also use the right bytes (F1 is the right ISO-8859-1 instead "C3 B1"
> > is the UTF8 sequence).
> >
> > The gist is displayed correctly on your browser because your browser
> > uses utf8 to show it to you: force it to ISO-8859-1 and you will see
> > the same sequence that mime4j gives you.
> >
> > Stefano
> >
> > > Regards,
> > > Lukas
> >

Re: Is it possible to have this mail parsed correctly?

Posted by Lukáš Vlček <lu...@gmail.com>.

Hi Stefano,

Thanks for the analysis. I extracted this use case to the following test:
https://github.com/lukas-vlcek/mime4j-test/blob/master/src/test/java/org/mime4j/test/BasicTest.java#L45

Now, the question is, why Mailman is able to render the output correctly if
the charset and used encoding in the body are not in sync. May be the
encoding of the message file has been changed when I copied the file from
the server to my local dev machine... or it is just coincidence? I do not
know... just thinking out loud...

Regards,
Lukas

On Fri, Dec 9, 2011 at 4:35 PM, Stefano Bagnara <ap...@bago.org> wrote:

> 2011/12/7 Lukáš Vlček <lu...@gmail.com>:
> > Hi,
> >
> > The following is a eml source of a short mail:
> > https://gist.github.com/5a9b383c1dc048fac6d4
> >
> > The following is a link to public (Mailman) pipermail rendered
> > representation of the same mail:
> >
> http://lists.jboss.org/pipermail/jboss-cluster-dev/2008-April/000000.html
> >
> > Note how the sign in the footer of the email contains name "Zamarreño".
> >
> > When using mime4j I am getting "ZamarreÃ±o" instead (tested with both 0.6
> > and 0.7.1).
> >
> > Is mime4j able to parse this mail the same way as Mailman (2.1.9) can do
> it?
>
> mime4j is doing the right thing.
> The message declares the charset as ISO-8859-1 and then use an UTF8
> sequence.
> So if you really want to use ñ in an ISO-8859-1 message make sure you
> also use the right bytes (F1 is the right ISO-8859-1 instead "C3 B1"
> is the UTF8 sequence).
>
> The gist is displayed correctly on your browser because your browser
> uses utf8 to show it to you: force it to ISO-8859-1 and you will see
> the same sequence that mime4j gives you.
>
> Stefano
>
> > Regards,
> > Lukas
>

Re: Is it possible to have this mail parsed correctly?

Posted by Stefano Bagnara <ap...@bago.org>.

2011/12/7 Lukáš Vlček <lu...@gmail.com>:
> Hi,
>
> The following is a eml source of a short mail:
> https://gist.github.com/5a9b383c1dc048fac6d4
>
> The following is a link to public (Mailman) pipermail rendered
> representation of the same mail:
> http://lists.jboss.org/pipermail/jboss-cluster-dev/2008-April/000000.html
>
> Note how the sign in the footer of the email contains name "Zamarreño".
>
> When using mime4j I am getting "ZamarreÃ±o" instead (tested with both 0.6
> and 0.7.1).
>
> Is mime4j able to parse this mail the same way as Mailman (2.1.9) can do it?

mime4j is doing the right thing.
The message declares the charset as ISO-8859-1 and then use an UTF8 sequence.
So if you really want to use ñ in an ISO-8859-1 message make sure you
also use the right bytes (F1 is the right ISO-8859-1 instead "C3 B1"
is the UTF8 sequence).

The gist is displayed correctly on your browser because your browser
uses utf8 to show it to you: force it to ISO-8859-1 and you will see
the same sequence that mime4j gives you.

Stefano

> Regards,
> Lukas

Re: Is it possible to have this mail parsed correctly?

Posted by Oleg Kalnichevski <ol...@apache.org>.

On Wed, 2011-12-07 at 17:00 +0100, Lukáš Vlček wrote:
> Hi,
> 
> The following is a eml source of a short mail:
> https://gist.github.com/5a9b383c1dc048fac6d4
> 
> The following is a link to public (Mailman) pipermail rendered
> representation of the same mail:
> http://lists.jboss.org/pipermail/jboss-cluster-dev/2008-April/000000.html
> 
> Note how the sign in the footer of the email contains name "Zamarreño".
> 
> When using mime4j I am getting "ZamarreÃ±o" instead (tested with both 0.6
> and 0.7.1).
> 
> Is mime4j able to parse this mail the same way as Mailman (2.1.9) can do it?
> 
> Regards,
> Lukas

Same here. I see no reason why mime4j should fail to parse this message,
but if you are reasonably sure this is the case please raise JIRA and
provide the message as a binary attachment.

Oleg