You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Michael Schuerig <mi...@schuerig.de> on 2004/09/08 16:07:26 UTC

JDT-Compiler character encoding

I've tried the following for combinations of settings, where
jspx denotes the encoding declared and used in my jspx document, 
jsp-javaEncoding is declared in conf/web.xml, and jasper-out is the 
relevant line in the generated xxx_jspx.java.

(1)
jspx: ISO-8859-1
jsp-javaEncoding: not explicitly set
jasper-out:
      out.write("\tÀöÌÃ<84>Ã<96>Ã<9C>Ã<9F>\n");

(2)
jspx: UTF-8
jsp-javaEncoding: not explicitly set
jasper-out:
      out.write("\tÀöÌÃ<84>Ã<96>Ã<9C>Ã<9F>\n");

(3)
jspx: ISO-8859-1
jsp-javaEncoding: ISO-8859-1
jasper-out:
      out.write("\täöüÄÖÜß\n");

(4)
jspx: UTF-8
jsp-javaEncoding: ISO-8859-1
jasper-out:
      out.write("\täöüÄÖÜß\n");

Only (3) and (4) appear correctly in the browser as "äöüÄÖÜß" (german 
umlauts). I don't think setting the javaEncoding should be necessary 
here, but I may well be misunderstanding something.

Without any javaEncoding given, jasper produces UTF-8 encoded java 
source code and the JDT compiler supposedly accepts UTF-8 as its 
default input encoding. I haven't verified the latter.

There seem to be two possible causes for the incorrect output

the JDT compiler doesn't behave as advertised, i.e., it does not take 
UTF-8 as default input encoding. *Or* the JDT compiler produces 
character output in UTF-8 which is latter erroneously treated as 
ISO-8859-1.

Michael

-- 
Michael Schuerig           Contests between male toads over females are
mailto:michael@schuerig.de     often settled by the depth of the croak.
http://www.schuerig.de/michael/                    --John Maynard Smith

---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


Re: JDT-Compiler character encoding

Posted by Michael Schuerig <mi...@schuerig.de>.
On Wednesday 08 September 2004 16:07, Michael Schuerig wrote:

> There seem to be two possible causes for the incorrect output
>
> the JDT compiler doesn't behave as advertised, i.e., it does not take
> UTF-8 as default input encoding. *Or* the JDT compiler produces
> character output in UTF-8 which is latter erroneously treated as
> ISO-8859-1.

Precompiled with Ant javac, encoding="UTF-8":
java:
      out.write("\n\n    TEST\n    
\n\tÀöÌÃ<84>Ã<96>Ã<9C>Ã<9F>\n\t\n\t");

decompiled class:
        out.write("\n\n    TEST\n    
\n\t\344\366\374\304\326\334\337\n\t\n\t");


Server compiled (without javaEncoding set in web.xml):
java:
      out.write("\tÀöÌÃ<84>Ã<96>Ã<9C>Ã<9F>\n");
decompiled class:
      out.write("\t\303\u20AC\303\266\303\u0152\303\204\303\226\303\234\303\237\n");


Server compiled (with javaEncoding ISO-8859-1 set in web.xml):
java:
      out.write("\täöüÄÖÜß\n");
decompiled class:
      out.write("\t\344\366\374\304\326\334\337\n");


Something's amiss here. Apparently, by default the JDT compiler does not 
take UTF-8 input correctly, rather it seems to expect ISO-8859-1.

Now, is this a bug or am I misunderstanding something?

Michael

-- 
Michael Schuerig                 Nothing is as brilliantly adaptive
mailto:michael@schuerig.de       as selective stupidity.
http://www.schuerig.de/michael/    --A.O. Rorty, The Deceptive Self

---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org