You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@jakarta.apache.org by sk...@tripod.com on 2000/02/03 21:09:43 UTC

Re: How to get JSP to work with Non-Latin languages


Hi,
I've been noticing a lot of messages on the list about JSP pages not being able
to display foreign (i.e. Non-latin) characters properly. I ran into this problem
last December and, after debugging Jasper (the servlet engine), found a problem
with the EscapeUnicodeWriter. I proposed a solution and sent it to this list
and the changes were accepted:
>   Revision  Changes    Path
>   1.2       +4 -4
jakarta-tomcat/src/share/org/apache/jasper/compiler/EscapeUnicodeWriter.java

I am currently running JSP pages, successfully, in Korean.  If you get the
latest
version of Jakarta (or Version 3.0 +Rev 1.2 of the EscapeUnicodeWriter) you
should be able to display foreign characters correctly. Also, remember that you
will need to set the property of the jsp page as follows:

<%@ page contentType="text/html; charset=Encoding" %>
for example for Korean JSP pages:
<%@ page contentType="text/html; charset=EUC_KR" %>

Another thing to keep in mind is the CharacterEncoding used by the java virtual
machine.
This is default encoding of the platform. CharacterEncoding affects the encoding
 Readers
and Writers use to encode bytes and decode characters respectively. It also
affects other things such as the String.getBytes() method etc. If you are
running on an English
Operating System then the CharacterEncoding used by the java virtual machine
will be Latin.
You can change this (as I did) using the file.encoding property. This is a
System property
that can be set for a given java virtual machine using the -vmargs switch. So
for example:
in the tomcat.bat file I changed
start java org.apache.tomcat.shell.Startup %2 %3 %4 %5 %6 %7 %8 %9
to
start java -Dfile.encoding=EUC_KR org.apache.tomcat.shell.Startup %2 %3 %4 %5 %6
 %7 %8 %9

Hope this helps :-)
Thanks,
-Sher




Re: How to get JSP to work with Non-Latin languages

Posted by MANDAR RAJE <ma...@pathfinder.eng.sun.com>.
I did a putback yesterday (but the commit message did not 
go out for some reason).

Here is what I have done:

* For (1) I use the encoding specified in the page (contentType)
  directive of the jsp or "8859_1" by default. This is the 
  encoding the JspReader will use to read the jsp file.

* For .java I always use "UTF8".

* If the contentType is specified using a directive I set
  the HTTP "Content-Type" header with the given value.
  (This was somehow missing from the implementation).

I tested this for different encodings and this seems to 
work. If you can also double-check this that would be
great.
 
Thanks,
Mandar.

"Preston L. Bannister" wrote:
> 
> Sher has this exactly right.  I spotted this problem when porting Tomcat to
> run on EBCDIC systems, but am too busy at the moment to do anything about it.
> 
> The root problem is that there are three encodings that the Jasper needs to
> be concerned with:
> 
> 1.  The input *.jsp file encoding.
> 2.  The intermediate *.java file encoding.
> 3.  The output HTML encoding.
> 
> The JSP spec covers (3) perfectly in the <%...@page> directive.
> Probably the best choice for (2) is UTF8 - though Jikes needs to support encodings.
> 
> For (1) the answer has to come from outside Jasper/Jakarta somehow.  What encoding
> you use for input *.jsp files is entirely up to your application, and at present
> there is no support for this.
> 
> Sher's solution of setting -Dfile.encoding=whatever is an excellent work-around,
> though not a general answer.
>

RE: How to get JSP to work with Non-Latin languages

Posted by "Preston L. Bannister" <pr...@home.com>.
Sher has this exactly right.  I spotted this problem when porting Tomcat to
run on EBCDIC systems, but am too busy at the moment to do anything about it.

The root problem is that there are three encodings that the Jasper needs to 
be concerned with:

1.  The input *.jsp file encoding.
2.  The intermediate *.java file encoding.
3.  The output HTML encoding.

The JSP spec covers (3) perfectly in the <%...@page> directive.
Probably the best choice for (2) is UTF8 - though Jikes needs to support encodings.

For (1) the answer has to come from outside Jasper/Jakarta somehow.  What encoding
you use for input *.jsp files is entirely up to your application, and at present 
there is no support for this.  

Sher's solution of setting -Dfile.encoding=whatever is an excellent work-around, 
though not a general answer. 


> -----Original Message-----
> From: skhurshid@tripod.com [mailto:skhurshid@tripod.com]
>
> I've been noticing a lot of messages on the list about JSP pages not being able
> to display foreign (i.e. Non-latin) characters properly. I ran into this problem
> last December and, after debugging Jasper (the servlet engine), found a problem
> with the EscapeUnicodeWriter. I proposed a solution and sent it to this list
> and the changes were accepted:
> >   Revision  Changes    Path
> >   1.2       +4 -4
> jakarta-tomcat/src/share/org/apache/jasper/compiler/EscapeUnicodeWriter.java
> 
> I am currently running JSP pages, successfully, in Korean.  If you get the
> latest
> version of Jakarta (or Version 3.0 +Rev 1.2 of the EscapeUnicodeWriter) you
> should be able to display foreign characters correctly. Also, remember that you
> will need to set the property of the jsp page as follows:
> 
> <%@ page contentType="text/html; charset=Encoding" %>
> for example for Korean JSP pages:
> <%@ page contentType="text/html; charset=EUC_KR" %>
> 
> Another thing to keep in mind is the CharacterEncoding used by the java virtual
> machine.
> This is default encoding of the platform. CharacterEncoding affects the encoding
>  Readers
> and Writers use to encode bytes and decode characters respectively. It also
> affects other things such as the String.getBytes() method etc. If you are
> running on an English
> Operating System then the CharacterEncoding used by the java virtual machine
> will be Latin.
> You can change this (as I did) using the file.encoding property. This is a
> System property
> that can be set for a given java virtual machine using the -vmargs switch. So
> for example:
> in the tomcat.bat file I changed
> start java org.apache.tomcat.shell.Startup %2 %3 %4 %5 %6 %7 %8 %9
> to
> start java -Dfile.encoding=EUC_KR org.apache.tomcat.shell.Startup %2 %3 %4 %5 %6
>  %7 %8 %9