You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Tagunov Anthony <at...@nnt.ru> on 2001/03/16 16:33:47 UTC

JOIN2: problem with nls

Hello, Developers!

I'm more then anctious to join the question.
There clearly is a problem for us using encodings other then Latin-1!

Please, Tomcat developers, could you shed some light on
what is done/can be done/is planned to be done not to leave
all non-latin1 users alone with the necessity to do it all by themselves.
The state of the servlet specs, the plans?..
          <form enctype="application/x-www-form-urlencoded; charset=UTF-8"
Doesn't change the headers my IE5 sends to the browser neither does
          <form accept-charset=""
also does not affect the behaviour of IE..
Setting a special parameter in the from looks ugly (it may containg
either the name of the encoding the page was served to the 
client with as often browser returns responce in the same coding
or put a hidden field with a set of test characters (not mine idea)
to see how they got recoded..)
This looks ugly, but any better then nothing..

 I'm afraid that if servlet specifications do not address and
solve this issue
in many applicaitons some buffer layers between
the Servlet Container and the code implementing logic will be
needed (that will do nothing but fix encoding)..

Develpers, have you got some insight on this? 

Maybe we, non-latin1 users (are there more of us
those who want to use UTF-8, windows-1252, windows-1251,
koi8-r) then those who are confined with latin-1?) should
bomb some other lists, mail addresses to raise attention
to this problem? (what i want is
max) have a reliable automatic way of getting the encoding right (not possible, er?
         due to the design of the browsers?)
min)  have a way to tell the container that _this request has encoding xyz_ and
         have a method of getting it coded right
mediu) support for these test character sequences built into the specs

On Fri, 16 Mar 2001 11:31:38 +0200, Aleksandras Novikovas wrote:

>Hello All,
>
>I'm posting for the first time, so please inform me if I do something wrong ...
>
>First of all - problem description :
>I have application in multilanguage (where user can dynamically change charset).
>Problem rises when user enters information in selected language.
>After parsePostData in HttpUtils I get lots of "????" instead of text.
>I can not rely on default system encoding, because application has ability to add the languages dynamically without recompilation.
>So I never know what next encoding system will need.
>
>I have written some code to work around this problem and think it would be nice to have it standard package.
>Actually I've changed parsePostData - added  encoding parameter.
>Now programmer could choose in what encoding InputStream is supplied.
>I have tested it with windows-1257 (Baltic) and windows-1251 (Cyrylic) - for me it worked.
>If someone find any errors - please let me know.
>Here is code of that method :
>
>////////////////////////////////////////////////////////////////////////////////
>// Parses data from an HTML form that the client sends to 
>// the server using the HTTP POST method and the 
>// <i>application/x-www-form-urlencoded</i> MIME type.
>//
>// <p>The data sent by the POST method contains key-value
>// pairs. A key can appear more than once in the POST data
>// with different values. However, the key appears only once in 
>// the hashtable, with its value being
>// an array of strings containing the multiple values sent
>// by the POST method.
>//
>// <p>The keys and values in the hashtable are stored in their
>// decoded form, so
>// any + characters are converted to spaces, and characters
>// sent in hexadecimal notation (like <i>%xx</i>) are
>// converted to specified encoding.
>//
>// @param len	an integer specifying the length,
>//				in characters, of the
>//				<code>ServletInputStream</code>
>//				object that is also passed to this
>//				method
>// @param in	the <code>ServletInputStream</code>
>//				object that contains the data sent
>//				from the client
>// @param enc	a String specifying the character encoding
>//				of the <code>ServletInputStream</code>
>//				object
>//
>// @return		a <code>HashTable</code> object built
>//				from the parsed key-value pairs
>//
>// @exception IllegalArgumentException	if the data
>//				sent by the POST method is invalid
>////////////////////////////////////////////////////////////////////////////////
>
>public Hashtable parsePostData (int len, ServletInputStream in, String enc)
>{
>	// XXX
>	// should a length of 0 be an IllegalArgumentException
>	
>	if (len <=0)
>	    return new Hashtable (); // cheap hack to return an empty hash
>
>	if (in == null) {
>	    throw new IllegalArgumentException ();
>	}
>
>	// Make sure we read the entire POSTed body.
>	byte [] postedBytes = new byte [len];
>	try {
>		int offset = 0;
>		do {
>			int inputLen = in.read (postedBytes, offset, len - offset);
>			if (inputLen <= 0) {
>				throw new IllegalArgumentException (lStrings.getString("err.io.short_read"));
>			}
>			offset += inputLen;
>		} while ((len - offset) > 0);
>	}
>	catch (IOException e) {
>		throw new IllegalArgumentException (e.getMessage ());
>	}
>
>	// Here some changes ...
>	// Direct parsing of postedBytes, converting to
>	// desired unicode symbol and forming final string
>	
>	StringBuffer sb = new StringBuffer ();
>	Integer unicodeInteger;
>	for (int i = 0; i < postedBytes.length - 1; i++) {
>		String testString = new String (postedBytes, i, 1);
>		switch (testString.charAt (0)) {
>			case '+' :
>				sb.append (' ');
>				break;
>			case '%' :
>				try {
>					// Here is actual conversion to unicode
>					unicodeInteger = Integer.valueOf (new String (postedBytes, i + 1, 2), 16);
>					sb.append (new String (new byte [] {unicodeInteger.byteValue ()}, enc));
>					i += 2;
>				}
>				catch (NumberFormatException e) {
>					throw new IllegalArgumentException ();
>				}
>				catch (UnsupportedEncodingException e) {
>					throw new IllegalArgumentException ();
>				}
>				catch (ArrayIndexOutOfBoundsException e) {
>					// This can happen only at the end of stream
>					// So just add the rest and stop loop
>					String rest = new String (postedBytes, i, postedBytes.length - i);
>					sb.append (rest);
>					i += rest.length ();
>				}
>				break;
>			default:
>				// Here do not use encodintg
>				// It is expected, that request is sent in 
>				sb.append (new String (postedBytes, i, 1));
>				break;
>		}
>	}
>	return (parseQueryString (sb.toString ()));
>}
>
>
>Best regards,
>Aleksandras Novikovas Aleksandras.Novikovas@post.5ci.lt
>IT manager
>Baltic Logistic System Vilnius Ltd.
>Kirtumu 51, Vilnius, Lithuania
>Phone: +370-2-390874; FAX: +370-2-390899; Mobile: +370-99-21678
>
>
>
>


best regards, Tagunov Anthony