You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Anton Tagunov <at...@mail.cnt.ru> on 2003/09/08 02:45:28 UTC

Re[4]: Charset encoding issue

Hello, Lima!

lccb> I've found a message (at
lccb> http://w6.metronet.com/~wjm/tomcat/2001/Mar/msg00547.html) :

lccb> "Tomcat follows the HTML standard,
Hmm.., to me it looks like a browser issue, not Tomcat.
Hence its a bit OT here, but still we have started the
discussion :-)

(again, as I have suggested before, Lima, you may want
to spy your browser-tomcat traffic to make sure what bytes
are transferred exactly, then you'll be sure who is mangling
the data: Tomcat or browser, my feeling is that this is browser)

lccb>  which explicitly declares that MIME type
lccb> "application/x-www-form-urlencoded" is suitable ONLY for transferring ASCII
practice seems to be different
lccb> (but will of course work for ISO 8859-1 as well).
look, it's already funny: according to the standard

  "application/x-www-form-urlencoded" is suitable ONLY for transferring ASCII

but according to this message the existing software

  will of course work for ISO 8859-1
  
did you enjoy this "of course"? ;-)
lccb>  See
lccb> http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1
lccb> It says:

lccb> The content type "application/x-www-form-urlencoded" is inefficient
lccb> for sending large quantities of binary data or text containing non-ASCII
lccb> characters. The content type "multipart/form-data" should be used for
lccb> submitting forms that contain files, non-ASCII data, and binary data."

Yup,  but in practice I  beleive that we have succeded many times
to send cyrillics this way. The browser was running on Windows
however. All the browsers (huh, do I remember it correctly?)
were using windows-1251 and koi8-r when the page was encoded with
the respective encoding).

lccb> SO           : Red Hat Linux 9
I assume, both browser and Tomcat, right?
lccb> Browsers     : Galeon e Mozilla
lccb> Reg. Settings: English
lccb> Keyboard Set.: English (internacional)
lccb> Locale       : Not modified. The JVM is using [us,EN], i think. But thats 
lccb>               ok because i prefer to test my application without change
lccb> locale to [pt,BR] (we never know when 2 webapp will run using differents
lccb> locale settings)
what language are you typing in? what kind of characters get mungled?

lccb> ((3)) I don't know why "page contentType" + "form enctype multipart" is
lccb> the only working combination but its ok for me. I just would like to
lccb> understand it :-|
So do we :-)


Re: Re [ 4 ] : Charset encoding issue

Posted by Anton Tagunov <at...@mail.cnt.ru>.
Hello Daniel!

Great to know you name at last, have not seen it
in your other posts! :-))

Sorry to have been addressig you by your surname!

DHAL>  How can i spy the traffic between Tomcat and the browsers ?

Just the question I've been waiting for =)
There's view on it in my http://tagunov.tripod.com
page, somewhere down there. Tell if can't find it there.

Anton


Re [ 4 ] : Charset encoding issue

Posted by Daniel H A Lima <li...@cit.com.br>.
Anton Tagunov wrote:

>Hello, Lima!
>
Hi, Anton.

>
>lccb> I've found a message (at
>lccb> http://w6.metronet.com/~wjm/tomcat/2001/Mar/msg00547.html) :
>
>lccb> "Tomcat follows the HTML standard,
>Hmm.., to me it looks like a browser issue, not Tomcat.
>Hence its a bit OT here, but still we have started the
>discussion :-)
>
>(again, as I have suggested before, Lima, you may want
>to spy your browser-tomcat traffic to make sure what bytes
>are transferred exactly, then you'll be sure who is mangling
>the data: Tomcat or browser, my feeling is that this is browser)
>
 How can i spy the traffic between Tomcat and the browsers ?

>
>lccb>  which explicitly declares that MIME type
>lccb> "application/x-www-form-urlencoded" is suitable ONLY for transferring ASCII
>practice seems to be different
>lccb> (but will of course work for ISO 8859-1 as well).
>look, it's already funny: according to the standard
>
>  "application/x-www-form-urlencoded" is suitable ONLY for transferring ASCII
>
>but according to this message the existing software
>
>  will of course work for ISO 8859-1
>  
>did you enjoy this "of course"? ;-)
>lccb>  See
>lccb> http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1
>lccb> It says:
>
>lccb> The content type "application/x-www-form-urlencoded" is inefficient
>lccb> for sending large quantities of binary data or text containing non-ASCII
>lccb> characters. The content type "multipart/form-data" should be used for
>lccb> submitting forms that contain files, non-ASCII data, and binary data."
>
>Yup,  but in practice I  beleive that we have succeded many times
>to send cyrillics this way. The browser was running on Windows
>however. All the browsers (huh, do I remember it correctly?)
>were using windows-1251 and koi8-r when the page was encoded with
>the respective encoding).
>
?

>
>lccb> SO           : Red Hat Linux 9
>I assume, both browser and Tomcat, right?
>
    Galeon, Mozilla and Tomcat running in Linux.

>lccb> Browsers     : Galeon e Mozilla
>lccb> Reg. Settings: English
>lccb> Keyboard Set.: English (internacional)
>lccb> Locale       : Not modified. The JVM is using [us,EN], i think. But thats 
>lccb>               ok because i prefer to test my application without change
>lccb> locale to [pt,BR] (we never know when 2 webapp will run using differents
>lccb> locale settings)
>what language are you typing in? what kind of characters get mungled?
>  
>
    Characters from brazilian portuguese. Like 'á', 'ç', 'ã', ...

>lccb> ((3)) I don't know why "page contentType" + "form enctype multipart" is
>lccb> the only working combination but its ok for me. I just would like to
>lccb> understand it :-|
>So do we :-)
>
>  
>
    Thanks !