You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mime4j-dev@james.apache.org by Lukáš Vlček <lu...@gmail.com> on 2011/12/07 17:00:02 UTC
Is it possible to have this mail parsed correctly?
Hi,
The following is a eml source of a short mail:
https://gist.github.com/5a9b383c1dc048fac6d4
The following is a link to public (Mailman) pipermail rendered
representation of the same mail:
http://lists.jboss.org/pipermail/jboss-cluster-dev/2008-April/000000.html
Note how the sign in the footer of the email contains name "Zamarreño".
When using mime4j I am getting "Zamarreño" instead (tested with both 0.6
and 0.7.1).
Is mime4j able to parse this mail the same way as Mailman (2.1.9) can do it?
Regards,
Lukas
Re: Is it possible to have this mail parsed correctly?
Posted by Oleg Kalnichevski <ol...@apache.org>.
On Mon, 2011-12-12 at 10:52 +0100, Lukáš Vlček wrote:
> Hi Stefano,
>
> Thanks for the analysis. I extracted this use case to the following test:
> https://github.com/lukas-vlcek/mime4j-test/blob/master/src/test/java/org/mime4j/test/BasicTest.java#L45
>
> Now, the question is, why Mailman is able to render the output correctly if
> the charset and used encoding in the body are not in sync. May be the
> encoding of the message file has been changed when I copied the file from
> the server to my local dev machine... or it is just coincidence? I do not
> know... just thinking out loud...
>
> Regards,
> Lukas
>
This is the hex dump of the message which suggests the message body
content is utf-8 coded, while the content-type header declares
ISO-8859-1 as the content charset.
00000860 6E 79 6F 6E 65 20 74 68 65 72 65 3F 20 3A 29 0A 0A 2D 2D
0A 47 61 6C 64 65 72 20 5A 61 6D 61 72 nyone there? :)..--.Galder
Zamar
00000880 72 65 C3 B1 6F 0A 53 72 2E 20 53 6F 66 74 77 61 72 65 20
4D 61 69 6E 74 65 6E 61 6E 63 65 20 45 re..o.Sr. Software
Maintenance E
It can be that the message got modified while copied, or it can be that
Mailman employs some sort of content type / charset detection mechanism.
In any case mime4j correctly decoded the message based on its metadata.
Oleg
> On Fri, Dec 9, 2011 at 4:35 PM, Stefano Bagnara <ap...@bago.org> wrote:
>
> > 2011/12/7 Lukáš Vlček <lu...@gmail.com>:
> > > Hi,
> > >
> > > The following is a eml source of a short mail:
> > > https://gist.github.com/5a9b383c1dc048fac6d4
> > >
> > > The following is a link to public (Mailman) pipermail rendered
> > > representation of the same mail:
> > >
> > http://lists.jboss.org/pipermail/jboss-cluster-dev/2008-April/000000.html
> > >
> > > Note how the sign in the footer of the email contains name "Zamarreño".
> > >
> > > When using mime4j I am getting "Zamarreño" instead (tested with both 0.6
> > > and 0.7.1).
> > >
> > > Is mime4j able to parse this mail the same way as Mailman (2.1.9) can do
> > it?
> >
> > mime4j is doing the right thing.
> > The message declares the charset as ISO-8859-1 and then use an UTF8
> > sequence.
> > So if you really want to use ñ in an ISO-8859-1 message make sure you
> > also use the right bytes (F1 is the right ISO-8859-1 instead "C3 B1"
> > is the UTF8 sequence).
> >
> > The gist is displayed correctly on your browser because your browser
> > uses utf8 to show it to you: force it to ISO-8859-1 and you will see
> > the same sequence that mime4j gives you.
> >
> > Stefano
> >
> > > Regards,
> > > Lukas
> >
Re: Is it possible to have this mail parsed correctly?
Posted by Lukáš Vlček <lu...@gmail.com>.
Hi Stefano,
Thanks for the analysis. I extracted this use case to the following test:
https://github.com/lukas-vlcek/mime4j-test/blob/master/src/test/java/org/mime4j/test/BasicTest.java#L45
Now, the question is, why Mailman is able to render the output correctly if
the charset and used encoding in the body are not in sync. May be the
encoding of the message file has been changed when I copied the file from
the server to my local dev machine... or it is just coincidence? I do not
know... just thinking out loud...
Regards,
Lukas
On Fri, Dec 9, 2011 at 4:35 PM, Stefano Bagnara <ap...@bago.org> wrote:
> 2011/12/7 Lukáš Vlček <lu...@gmail.com>:
> > Hi,
> >
> > The following is a eml source of a short mail:
> > https://gist.github.com/5a9b383c1dc048fac6d4
> >
> > The following is a link to public (Mailman) pipermail rendered
> > representation of the same mail:
> >
> http://lists.jboss.org/pipermail/jboss-cluster-dev/2008-April/000000.html
> >
> > Note how the sign in the footer of the email contains name "Zamarreño".
> >
> > When using mime4j I am getting "Zamarreño" instead (tested with both 0.6
> > and 0.7.1).
> >
> > Is mime4j able to parse this mail the same way as Mailman (2.1.9) can do
> it?
>
> mime4j is doing the right thing.
> The message declares the charset as ISO-8859-1 and then use an UTF8
> sequence.
> So if you really want to use ñ in an ISO-8859-1 message make sure you
> also use the right bytes (F1 is the right ISO-8859-1 instead "C3 B1"
> is the UTF8 sequence).
>
> The gist is displayed correctly on your browser because your browser
> uses utf8 to show it to you: force it to ISO-8859-1 and you will see
> the same sequence that mime4j gives you.
>
> Stefano
>
> > Regards,
> > Lukas
>
Re: Is it possible to have this mail parsed correctly?
Posted by Stefano Bagnara <ap...@bago.org>.
2011/12/7 Lukáš Vlček <lu...@gmail.com>:
> Hi,
>
> The following is a eml source of a short mail:
> https://gist.github.com/5a9b383c1dc048fac6d4
>
> The following is a link to public (Mailman) pipermail rendered
> representation of the same mail:
> http://lists.jboss.org/pipermail/jboss-cluster-dev/2008-April/000000.html
>
> Note how the sign in the footer of the email contains name "Zamarreño".
>
> When using mime4j I am getting "Zamarreño" instead (tested with both 0.6
> and 0.7.1).
>
> Is mime4j able to parse this mail the same way as Mailman (2.1.9) can do it?
mime4j is doing the right thing.
The message declares the charset as ISO-8859-1 and then use an UTF8 sequence.
So if you really want to use ñ in an ISO-8859-1 message make sure you
also use the right bytes (F1 is the right ISO-8859-1 instead "C3 B1"
is the UTF8 sequence).
The gist is displayed correctly on your browser because your browser
uses utf8 to show it to you: force it to ISO-8859-1 and you will see
the same sequence that mime4j gives you.
Stefano
> Regards,
> Lukas
Re: Is it possible to have this mail parsed correctly?
Posted by Oleg Kalnichevski <ol...@apache.org>.
On Wed, 2011-12-07 at 17:00 +0100, Lukáš Vlček wrote:
> Hi,
>
> The following is a eml source of a short mail:
> https://gist.github.com/5a9b383c1dc048fac6d4
>
> The following is a link to public (Mailman) pipermail rendered
> representation of the same mail:
> http://lists.jboss.org/pipermail/jboss-cluster-dev/2008-April/000000.html
>
> Note how the sign in the footer of the email contains name "Zamarreño".
>
> When using mime4j I am getting "Zamarreño" instead (tested with both 0.6
> and 0.7.1).
>
> Is mime4j able to parse this mail the same way as Mailman (2.1.9) can do it?
>
> Regards,
> Lukas
Same here. I see no reason why mime4j should fail to parse this message,
but if you are reasonably sure this is the case please raise JIRA and
provide the message as a binary attachment.
Oleg