You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Nikola Milutinovic <Ni...@ev.co.yu> on 2001/12/05 12:36:21 UTC

Character Encoding problems 2

Hi all.

It's me again and troubles are not resolved. I've created a simple test servlet:

----------------------------------------
import javax.servlet.*;
import java.io.*;

public class TestServlet extends GenericServlet {
  private static final String testText = "\uC5A0 \uC5A1 \uC486 \uC487 \uC48C \uC48D \uC490 \uC491 \uC5BD \uC5BE";
  PrintWriter out;

  public void service( ServletRequest req, ServletResponse res )
    throws javax.servlet.ServletException, java.io.IOException
    {
    res.setContentType("text/html; charset=ISO-8859-2");
    out = res.getWriter();
    out.print( "<html>\r\n<head><title>Test servlet</title>\r\n" );
    out.print( "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=iso-8859-2\">\r\n</head>\r\n" );
    out.print( "<body>\r\n<h1>Test</h1>\r\n<p>Let us see how this gets out</p>\r\n<p>\r\n<p>" );
    out.print( testText );
    out.print( "</p>\r\n</body>\r\n</html>" );
  }
}
----------------------------------------

This prints "?" instead of characters. The string in question prints desired characters in an ordinary Java application.

QUESTION 1
---------------
How can I get Tomcat to honour "charset=ISO-8859-2"?

QUESTION 2
---------------
What about static HTML? Suppose I should enter a part of static HTML data in Latin-2 encoding. That translates to a string. A string is supposed to be Unicode. Do those strings get translated from "pageEncoding" to Unicode?

Nix.

Re: Character Encoding problems 2

Posted by Martin Fekete <fe...@zoznam.sk>.
i got some problems with encoding too ... but there was problem when i
submit data from forms (page was in cp1250 submited data were iso-8859-?)
... when i submited and writed to DB characters was wrong ... solution was
to add filter which sets encoding of each request ...

more here ..
http://marc.theaimsgroup.com/?l=tomcat-user&m=100679292919360&w=2

feky

----- Original Message -----
From: "Nikola Milutinovic" <Ni...@ev.co.yu>
To: "Tomcat Users List" <to...@jakarta.apache.org>
Sent: Wednesday, December 05, 2001 2:19 PM
Subject: Re: Character Encoding problems 2


> > I have had similar problem with Cp1250 encoding(Tomcat and MySQL). You
> > have to have in mind this was not done on Tomcat 4.x, but 3.x.
> > This is what I have done:
> > - <%@page contentType="text/html; charset=windows-1250"%> on top of
> > every JSP file
>
> I don't think that is a correct character encoding as far as Java is
concerned. I think Java supports only ISO-8859-* and UTF-*. Please correct
me if I'm wrong
>
> > - default_character_set=latin2 in my.cnf
>
> Is there a way to set defaul character encoding for Tomcat? Setting
LOCALLE on Unix?
>
> > - created new database so it gets created in latin2 character set
>
> Done that with PostgreSQL.
>
> > - when I connected to MySQL I was using mm.mysql driver and the database
> > URL was
> >
jdbc:mysql://hostname:port/database?characterEncoding=Cp1250&useUnicode=true
>
> I've never used MySQL, just PostgreSQL. So, the database is ISO-8859-2 and
this converts it to CP-1250, which goes by as Latin-1, as far as Tomcat is
concerned.
>
> I have had a similar "success" with my setup: the database was Latin-1,
the data in it was win-1250 and when I forced JDBC connection to Latin-1
charset, it would pass through JSP. But that is such a hack...
>
> > Then all characters were correctly displayed on JSP pages.
>
> What I'm looking for is a "politically correct" solution. I have so far:
>
> - PostgreSQL with one Unicode and one ISO-8859-2 databases, both with the
same data in correct form.
> - JDBC driver which is acting OK.
> - JSP pages with correctly set pageEncoding
> - Java Servlet with correctly set contentType/encoding
>
> Still, Tomcat goes for default charset encoding and screwes up Latin-2
characters.
>
> Any help?
>
> Nix.
>



--
To unsubscribe:   <ma...@jakarta.apache.org>
For additional commands: <ma...@jakarta.apache.org>
Troubles with the list: <ma...@jakarta.apache.org>


Re: Character Encoding problems 2

Posted by " <gregor.kovac@mikropis.si>" <Gregor>.
Hi!

Nikola Milutinovic wrote:

>>>>I have had similar problem with Cp1250 encoding(Tomcat and MySQL). You 
>>>>have to have in mind this was not done on Tomcat 4.x, but 3.x.
>>>>This is what I have done:
>>>>- <%@page contentType="text/html; charset=windows-1250"%> on top of 
>>>>every JSP file
>>>>
>>>>
>>>I don't think that is a correct character encoding as far as Java is concerned. I think Java supports only ISO-8859-* and UTF-*. Please correct me if I'm wrong
>>>
>>>
>>
>>I'm sorry, butr you are wrong. You can convert between numerous 
>>encodings, but you have to have i18n.jar in your classpath.
>>
> 
> Hmm, I thought that Java community loathed anything but ISO, where can I find i18n.jar? I'll look for it on Sun's site, but if it is not there, drop me a line.
> 

You can get it in jre/lib directory of your JDK install directory.


> 
>>>What I'm looking for is a "politically correct" solution. I have so far:
>>>
>>>- PostgreSQL with one Unicode and one ISO-8859-2 databases, both with the same data in correct form.
>>>- JDBC driver which is acting OK.
>>>- JSP pages with correctly set pageEncoding
>>>- Java Servlet with correctly set contentType/encoding
>>>
>>>Still, Tomcat goes for default charset encoding and screwes up Latin-2 characters.
>>>
>>>
>>Have you tried putting %@page contentType="text/html; 
>>charset=iso8859-2"%> on top of your JSP's ?
>>
> 
> Always. And that is what is driving me crazy. I have even tested what is the character encoding of the ServletResponse object - it was OK, ISO-8859-2. The trouth is I'm running 4.0.1 and I have been looking at sources for 4.0. I'll test 4.0 and if it displays characters correctly, there's gonna be a bug report.
> 
> Nix.
> 

Best regards,
	Kovi


--
To unsubscribe:   <ma...@jakarta.apache.org>
For additional commands: <ma...@jakarta.apache.org>
Troubles with the list: <ma...@jakarta.apache.org>


Re: Character Encoding problems 2

Posted by Nikola Milutinovic <Ni...@ev.co.yu>.
> >>I have had similar problem with Cp1250 encoding(Tomcat and MySQL). You 
> >>have to have in mind this was not done on Tomcat 4.x, but 3.x.
> >>This is what I have done:
> >>- <%@page contentType="text/html; charset=windows-1250"%> on top of 
> >>every JSP file
> >>
> > 
> > I don't think that is a correct character encoding as far as Java is concerned. I think Java supports only ISO-8859-* and UTF-*. Please correct me if I'm wrong
> > 
> 
> 
> I'm sorry, butr you are wrong. You can convert between numerous 
> encodings, but you have to have i18n.jar in your classpath.

Hmm, I thought that Java community loathed anything but ISO, where can I find i18n.jar? I'll look for it on Sun's site, but if it is not there, drop me a line.

> > What I'm looking for is a "politically correct" solution. I have so far:
> > 
> > - PostgreSQL with one Unicode and one ISO-8859-2 databases, both with the same data in correct form.
> > - JDBC driver which is acting OK.
> > - JSP pages with correctly set pageEncoding
> > - Java Servlet with correctly set contentType/encoding
> > 
> > Still, Tomcat goes for default charset encoding and screwes up Latin-2 characters.
> > 
> 
> Have you tried putting %@page contentType="text/html; 
> charset=iso8859-2"%> on top of your JSP's ?

Always. And that is what is driving me crazy. I have even tested what is the character encoding of the ServletResponse object - it was OK, ISO-8859-2. The trouth is I'm running 4.0.1 and I have been looking at sources for 4.0. I'll test 4.0 and if it displays characters correctly, there's gonna be a bug report.

Nix.

Re: Character Encoding problems 2

Posted by " <gregor.kovac@mikropis.si>" <Gregor>.
Hi!

Nikola Milutinovic wrote:

>>I have had similar problem with Cp1250 encoding(Tomcat and MySQL). You 
>>have to have in mind this was not done on Tomcat 4.x, but 3.x.
>>This is what I have done:
>>- <%@page contentType="text/html; charset=windows-1250"%> on top of 
>>every JSP file
>>
> 
> I don't think that is a correct character encoding as far as Java is concerned. I think Java supports only ISO-8859-* and UTF-*. Please correct me if I'm wrong
> 


I'm sorry, butr you are wrong. You can convert between numerous 
encodings, but you have to have i18n.jar in your classpath.


> 
>>- default_character_set=latin2 in my.cnf
>>
> 
> Is there a way to set defaul character encoding for Tomcat? Setting LOCALLE on Unix?
> 


Hmm, I wouldn't know.... Sorry.


> 
>>- created new database so it gets created in latin2 character set
>>
> 
> Done that with PostgreSQL.
> 
> 
>>- when I connected to MySQL I was using mm.mysql driver and the database 
>>URL was 
>>jdbc:mysql://hostname:port/database?characterEncoding=Cp1250&useUnicode=true
>>
> 
> I've never used MySQL, just PostgreSQL. So, the database is ISO-8859-2 and this converts it to CP-1250, which goes by as Latin-1, as far as Tomcat is concerned.
> 
> I have had a similar "success" with my setup: the database was Latin-1, the data in it was win-1250 and when I forced JDBC connection to Latin-1 charset, it would pass through JSP. But that is such a hack...
> 
> 
>>Then all characters were correctly displayed on JSP pages.
>>
> 
> What I'm looking for is a "politically correct" solution. I have so far:
> 
> - PostgreSQL with one Unicode and one ISO-8859-2 databases, both with the same data in correct form.
> - JDBC driver which is acting OK.
> - JSP pages with correctly set pageEncoding
> - Java Servlet with correctly set contentType/encoding
> 
> Still, Tomcat goes for default charset encoding and screwes up Latin-2 characters.
> 

Have you tried putting %@page contentType="text/html; 
charset=iso8859-2"%> on top of your JSP's ?


> Any help?
> 
> Nix.
> 


Best regards,
	Kovi





--
To unsubscribe:   <ma...@jakarta.apache.org>
For additional commands: <ma...@jakarta.apache.org>
Troubles with the list: <ma...@jakarta.apache.org>


Re: Character Encoding problems 2

Posted by Nikola Milutinovic <Ni...@ev.co.yu>.
> I have had similar problem with Cp1250 encoding(Tomcat and MySQL). You 
> have to have in mind this was not done on Tomcat 4.x, but 3.x.
> This is what I have done:
> - <%@page contentType="text/html; charset=windows-1250"%> on top of 
> every JSP file

I don't think that is a correct character encoding as far as Java is concerned. I think Java supports only ISO-8859-* and UTF-*. Please correct me if I'm wrong

> - default_character_set=latin2 in my.cnf

Is there a way to set defaul character encoding for Tomcat? Setting LOCALLE on Unix?

> - created new database so it gets created in latin2 character set

Done that with PostgreSQL.

> - when I connected to MySQL I was using mm.mysql driver and the database 
> URL was 
> jdbc:mysql://hostname:port/database?characterEncoding=Cp1250&useUnicode=true

I've never used MySQL, just PostgreSQL. So, the database is ISO-8859-2 and this converts it to CP-1250, which goes by as Latin-1, as far as Tomcat is concerned.

I have had a similar "success" with my setup: the database was Latin-1, the data in it was win-1250 and when I forced JDBC connection to Latin-1 charset, it would pass through JSP. But that is such a hack...

> Then all characters were correctly displayed on JSP pages.

What I'm looking for is a "politically correct" solution. I have so far:

- PostgreSQL with one Unicode and one ISO-8859-2 databases, both with the same data in correct form.
- JDBC driver which is acting OK.
- JSP pages with correctly set pageEncoding
- Java Servlet with correctly set contentType/encoding

Still, Tomcat goes for default charset encoding and screwes up Latin-2 characters.

Any help?

Nix.

Re: Character Encoding problems 2

Posted by " <gregor.kovac@mikropis.si>" <Gregor>.
Hi!

I have had similar problem with Cp1250 encoding(Tomcat and MySQL). You 
have to have in mind this was not done on Tomcat 4.x, but 3.x.
This is what I have done:
- <%@page contentType="text/html; charset=windows-1250"%> on top of 
every JSP file
- default_character_set=latin2 in my.cnf
- created new database so it gets created in latin2 character set
- when I connected to MySQL I was using mm.mysql driver and the database 
URL was 
jdbc:mysql://hostname:port/database?characterEncoding=Cp1250&useUnicode=true

Then all characters were correctly displayed on JSP pages.

I hope this helps.

Best regards,
	Kovi

Nikola Milutinovic wrote:

> Hi all.
> 
> It's me again and troubles are not resolved. I've created a simple test servlet:
> 
> ----------------------------------------
> import javax.servlet.*;
> import java.io.*;
> 
> public class TestServlet extends GenericServlet {
>   private static final String testText = "\uC5A0 \uC5A1 \uC486 \uC487 \uC48C \uC48D \uC490 \uC491 \uC5BD \uC5BE";
>   PrintWriter out;
> 
>   public void service( ServletRequest req, ServletResponse res )
>     throws javax.servlet.ServletException, java.io.IOException
>     {
>     res.setContentType("text/html; charset=ISO-8859-2");
>     out = res.getWriter();
>     out.print( "<html>\r\n<head><title>Test servlet</title>\r\n" );
>     out.print( "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=iso-8859-2\">\r\n</head>\r\n" );
>     out.print( "<body>\r\n<h1>Test</h1>\r\n<p>Let us see how this gets out</p>\r\n<p>\r\n<p>" );
>     out.print( testText );
>     out.print( "</p>\r\n</body>\r\n</html>" );
>   }
> }
> ----------------------------------------
> 
> This prints "?" instead of characters. The string in question prints desired characters in an ordinary Java application.
> 
> QUESTION 1
> ---------------
> How can I get Tomcat to honour "charset=ISO-8859-2"?
> 
> QUESTION 2
> ---------------
> What about static HTML? Suppose I should enter a part of static HTML data in Latin-2 encoding. That translates to a string. A string is supposed to be Unicode. Do those strings get translated from "pageEncoding" to Unicode?
> 
> Nix.
> 



--
To unsubscribe:   <ma...@jakarta.apache.org>
For additional commands: <ma...@jakarta.apache.org>
Troubles with the list: <ma...@jakarta.apache.org>