You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by hk...@dfki.uni-kl.de on 2007/05/21 16:51:04 UTC

Encoding in Tomcat 6

Hi all,

I noticed some encoding problems inside servlets, when switching from
Tomcat 5.5.20 to Tomcat 6.0.10. I looked for it in the mailing lists,
but didn't find something appropriate.


Scenario:
An own servlet (that is: a class derived from HttpServlet) is creating
very simple HTML output, containing (beside the necessary HTML tags
<html>,<body> etc.) just some German special characters (ä ö ü).

The java source code is UTF-8, the response instance is configured via
  response.setContentType( "text/html;charset=UTF-8" );
Just for safety I also added
  response.setCharacterEncoding( "UTF-8" );

The created HTML text contains a meta tag
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Nevertheless: when calling the corresponding URL, all the special
characters are not displayed correctly in the browser (Firefox), when
using Tomcat 6. If I switch the encoding of the displayed page to
ISO-8859-1 in Firefox the characters are displayed correctly. That is:
it seems to me that everything is okay with the servlet, except that the
used encoding for the response is ISO-8859-1 instead of UTF-8.

When using Tomcat 5.5 everything is displayed correctly as UTF-8. Java
Server Pages do _not_ show similar behaviour.

Has anyone experienced similar problems?

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Encoding in Tomcat 6

Posted by hk...@dfki.uni-kl.de.
Markus Schönhaber wrote:
> ... ServletOutputStream is "suitable for writing binary data in the
> response" as the docs say. If you want to transmit textual data, use
> HttpServletResponse#getWriter() (see my question above).
yes, this really is a point, Georg's answer already pointed me to the 
right direction. Nevertheless I must say it's not obvious to me, what 
the meaning of 'writing binary data in the response' is. I would have 
expected that setting the response's character encoding plus writing to 
its output stream would get the encoding right. The java string can not 
be written as is (because its java's internal representation of a 
string) and the conversion to characters somewhere in the dark behind 
the response class could be done correctly, because I did set the encoding.

Well, nevertheless, it works with the getWriter() method as I already 
checked. Thank you very much for your help.

Regards,
	hk

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Encoding in Tomcat 6

Posted by Markus Schönhaber <ma...@schoenhaber.de>.
hkml@dfki.uni-kl.de wrote:

> Markus Schönhaber wrote:
>  > Works fine for me.
> Well, that is really a surprise for me. I tried this in 3 different 
> operating systems and it was consequently wrong.

That, in turn, doesn't surprise me, since...

>  > You do call response#setContentType before response#getWriter, don't you?
>  > There's no filter changing things?
> Well, the code is more or less trivial: the class extends HttpServlet 
> and overwrites method doGet like this:
> 
> @Override
>      protected void doGet( HttpServletRequest request,
>                            HttpServletResponse response )
>          throws ServletException, IOException
>      {
>          response.setContentType( "text/html;charset=UTF-8" );
>          response.setCharacterEncoding( "utf-8" );
> 
>          ServletOutputStream out = response.getOutputStream();

... ServletOutputStream is "suitable for writing binary data in the
response" as the docs say. If you want to transmit textual data, use
HttpServletResponse#getWriter() (see my question above).

Regards
  mks

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Encoding in Tomcat 6

Posted by hk...@dfki.uni-kl.de.
Markus Schönhaber wrote:
 > Works fine for me.
Well, that is really a surprise for me. I tried this in 3 different 
operating systems and it was consequently wrong.

 > You do call response#setContentType before response#getWriter, don't you?
 > There's no filter changing things?
Well, the code is more or less trivial: the class extends HttpServlet 
and overwrites method doGet like this:

@Override
     protected void doGet( HttpServletRequest request,
                           HttpServletResponse response )
         throws ServletException, IOException
     {
         response.setContentType( "text/html;charset=UTF-8" );
         response.setCharacterEncoding( "utf-8" );

         ServletOutputStream out = response.getOutputStream();
         out.println( "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0" +
              " Strict//EN\" \"http://www.w3.org/TR/xhtml1" +
              "/DTD/xhtml1-strict.dtd\">" );
         out.println( "<html><head>" );
         out.println( "<meta http-equiv=\"Content-Type\" " +
                          "content=\"text/html; charset=utf-8\" />" );
         out.println( "</head>" );
         out.println( "<body>" );
         out.println( "<p>Just an encoding test: ä ö ü Ä Ö Ü ß</p>" );
         out.println( "</body>" );
         out.println( "</html>" );
     }

That's all.

 > BTW: I consider LiveHTTPHeaders an incredibly useful Firefox extension
 > when it comes to finding out which headers the server really sends.>
Thanks for the hint, I just installed it. Nevertheless in this case the 
HTTP header must be innocent: Firefox is using encoding UTF-8 to show 
the page, which is absolutely correct. The problem is simply the fact, 
that the characters are encoded as ISO-8859-1 (probably by the response 
output stream).

Best regards,
	hk

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Encoding in Tomcat 6

Posted by Markus Schönhaber <ma...@schoenhaber.de>.
hkml@dfki.uni-kl.de wrote:

> Scenario:
> An own servlet (that is: a class derived from HttpServlet) is creating
> very simple HTML output, containing (beside the necessary HTML tags
> <html>,<body> etc.) just some German special characters (ä ö ü).
> 
> The java source code is UTF-8, the response instance is configured via
>   response.setContentType( "text/html;charset=UTF-8" );
> Just for safety I also added
>   response.setCharacterEncoding( "UTF-8" );
> 
> The created HTML text contains a meta tag
>   <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
> 
> Nevertheless: when calling the corresponding URL, all the special
> characters are not displayed correctly in the browser (Firefox), when
> using Tomcat 6. If I switch the encoding of the displayed page to
> ISO-8859-1 in Firefox the characters are displayed correctly. That is:
> it seems to me that everything is okay with the servlet, except that the
> used encoding for the response is ISO-8859-1 instead of UTF-8.

Works fine for me.
You do call response#setContentType before response#getWriter, don't you?
There's no filter changing things?

BTW: I consider LiveHTTPHeaders an incredibly useful Firefox extension
when it comes to finding out which headers the server really sends.

Regards
  mks

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Encoding in Tomcat 6

Posted by hk...@dfki.uni-kl.de.
uzi wrote:
> i liked this article regarding encoding:
> http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/index.html
Thanks for the hint. Looks nice.

Cheers,
	Heinz

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Encoding in Tomcat 6

Posted by uzi <de...@gmx.de>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

i liked this article regarding encoding:
http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/index.html

i think, it sais all one have to know... (at least in the context of web
apps)

uzi



hkml@dfki.uni-kl.de wrote:
> Georg Sauer-Limbach wrote:
>>> I do not think it is very obvious, that the response class is writing
>>> the characters using the platform's default encoding in this case
>> Yes. And this is true for many, many places in the
>> Java library. Always watch out if you see some
>> String being processed using a Stream.
> Yes, in general I take care of that, but in this case:
> The response (who knows what encoding I want) gives me a special stream
> where I find a method println( String s ). Why on earth should they
> guess a character encoding for character output then.
> Nevertheless: they said what they did in the apidoc, so it must be okay.
> Strange enough, that it worked correctly in older Tomcat versions.
> 
>> The ServletOutputStream shouldn't have all these
>> print methods, at least not the one for String.
> The word deprecated comes to my mind :-)
> 
> Cheers and thanks again,
> 	Heinz
> 
> ---------------------------------------------------------------------
> To start a new topic, e-mail: users@tomcat.apache.org
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGUqwLWksoyMRHEmMRAnssAJ9oY3odBMJW1A3W9kDQCvPTRwSgYQCeJqXP
GMCSqrewZymi3fcqiisYBDo=
=/bII
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Encoding in Tomcat 6

Posted by hk...@dfki.uni-kl.de.
Georg Sauer-Limbach wrote:
>> I do not think it is very obvious, that the response class is writing
>> the characters using the platform's default encoding in this case
> 
> Yes. And this is true for many, many places in the
> Java library. Always watch out if you see some
> String being processed using a Stream.
Yes, in general I take care of that, but in this case:
The response (who knows what encoding I want) gives me a special stream
where I find a method println( String s ). Why on earth should they
guess a character encoding for character output then.
Nevertheless: they said what they did in the apidoc, so it must be okay.
Strange enough, that it worked correctly in older Tomcat versions.

> The ServletOutputStream shouldn't have all these
> print methods, at least not the one for String.
The word deprecated comes to my mind :-)

Cheers and thanks again,
	Heinz

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Encoding in Tomcat 6

Posted by Georg Sauer-Limbach <gs...@gslweb.de>.
hkml@dfki.uni-kl.de wrote:
> Georg Sauer-Limbach wrote:
>> the question is: How do you create the output of
>> the servlet, that is, with which Writer or OutputStream.
> yes you're right: I simply used the output stream.

Never do this if you want to output character data.
(Unless you do the encoding yourself, like in
   outStream.write( myString.getBytes( "ISO-8859-1" ) );
or you got the bytes from some pre-encoded byte data,
say, from a file; but be sure what the encoding of
that file is!)

> I do not think it is very obvious, that the response class is writing the 
> characters using the platform's default encoding in this case

Yes. And this is true for many, many places in the
Java library. Always watch out if you see some
String being processed using a Stream.

The ServletOutputStream shouldn't have all these
print methods, at least not the one for String.

> Nevertheless I checked the javadoc and it correctly says, that 
> ServletOutputStream is just for binary output (whatever the use of 
> binary data in a website is).

Images or PDF for example.

Georg

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Encoding in Tomcat 6

Posted by hk...@dfki.uni-kl.de.
Georg Sauer-Limbach wrote:
> the question is: How do you create the output of
> the servlet, that is, with which Writer or OutputStream.
yes you're right: I simply used the output stream.

> But if you just obtain the output byte stream of the servlet,
> ie by calling
> 
>   OutputStream outputStream = response.getOutputStream();
> 
> and you use this stream to output character data, then the
> call to response.setCharacterEncoding() is completely useless.
You're obviously write, I tried using getWriter() and things work as 
expected. But it is as I said in my last answer to Markus Schönhaber: I 
do not think it is very obvious, that the response class is writing the 
characters using the platform's default encoding in this case (the 
correct encoding is well known!).

Nevertheless I checked the javadoc and it correctly says, that 
ServletOutputStream is just for binary output (whatever the use of 
binary data in a website is).

> Then it only counts what you do write to this stream yourself.
> ...
> 
> Hope this helps.
Yes, it really did. Thank you very much.

Heinz


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Encoding in Tomcat 6

Posted by Georg Sauer-Limbach <gs...@gslweb.de>.
Hi,

the question is: How do you create the output of
the servlet, that is, with which Writer or OutputStream.

If you do this:

   public void doGet( HttpServletRequest request,
               HttpServletResponse response ) throws IOException {
     response.setCharacterEncoding( "UTF-8" );
     Writer writer = response.getWriter();
     writer.write( "Hällo Wörld." );
   }

Then the Writer you obtain by response.getWriter()
takes into account what you set by calling setCharacterEncoding().
Now this Writer will write strings in UTF-8 encoding to
the output byte stream.

But if you just obtain the output byte stream of the servlet,
ie by calling

   OutputStream outputStream = response.getOutputStream();

and you use this stream to output character data, then the
call to response.setCharacterEncoding() is completely useless.
Then it only counts what you do write to this stream yourself.
Wrong would be:

   outputStream.write( "Hällo Wörld.".getBytes() );
   // who knows what encoding is used here: it is the
   // "platform's default encoding"

Ok would be:

   Writer goodWriter = new java.io.OutputStreamWriter(
       response.getOutputStream(), "UTF-8" );

Only by using OutputStreamWriter explicitely with this
constructor (or the newer ones, with the Charset and
CharsetEncoder arguments) can you safely create a character
data output with the intended encoding.

Hope this helps.

Georg


hkml@dfki.uni-kl.de wrote:
> Hi all,
> 
> I noticed some encoding problems inside servlets, when switching from
> Tomcat 5.5.20 to Tomcat 6.0.10. I looked for it in the mailing lists,
> but didn't find something appropriate.
> 
> 
> Scenario:
> An own servlet (that is: a class derived from HttpServlet) is creating
> very simple HTML output, containing (beside the necessary HTML tags
> <html>,<body> etc.) just some German special characters (ä ö ü).
> 
> The java source code is UTF-8, the response instance is configured via
>   response.setContentType( "text/html;charset=UTF-8" );
> Just for safety I also added
>   response.setCharacterEncoding( "UTF-8" );
> 
> The created HTML text contains a meta tag
>   <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
> 
> Nevertheless: when calling the corresponding URL, all the special
> characters are not displayed correctly in the browser (Firefox), when
> using Tomcat 6. If I switch the encoding of the displayed page to
> ISO-8859-1 in Firefox the characters are displayed correctly. That is:
> it seems to me that everything is okay with the servlet, except that the
> used encoding for the response is ISO-8859-1 instead of UTF-8.
> 
> When using Tomcat 5.5 everything is displayed correctly as UTF-8. Java
> Server Pages do _not_ show similar behaviour.
> 
> Has anyone experienced similar problems?
> 
> ---------------------------------------------------------------------
> To start a new topic, e-mail: users@tomcat.apache.org
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 
> 


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Encoding in Tomcat 6

Posted by hk...@dfki.uni-kl.de.
hkml@dfki.uni-kl.de wrote:
> <html>,<body> etc.) just some German special characters (ä ö ü).
sorry for that encoding problem, it should read ä ö ü. I first sent the
message using a different mail address. Then I got a response from the
list server, that I'm not allowed to send messages to this list and
afterwards I simply copied and pasted the text from the returned email.
The returned email already contained the wrong characters, because the
mailer daemon is answering without setting the encoding in the mail
header properly.

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


RE: Encoding in Tomcat 6

Posted by "Fargusson.Alan" <Al...@ftb.ca.gov>.
Is it valid to say "charset=UTF-8"?

-----Original Message-----
From: hkml@dfki.uni-kl.de [mailto:hkml@dfki.uni-kl.de]
Sent: Monday, May 21, 2007 7:51 AM
To: users@tomcat.apache.org
Subject: Encoding in Tomcat 6


Hi all,

I noticed some encoding problems inside servlets, when switching from
Tomcat 5.5.20 to Tomcat 6.0.10. I looked for it in the mailing lists,
but didn't find something appropriate.


Scenario:
An own servlet (that is: a class derived from HttpServlet) is creating
very simple HTML output, containing (beside the necessary HTML tags
<html>,<body> etc.) just some German special characters (ä ö ü).

The java source code is UTF-8, the response instance is configured via
  response.setContentType( "text/html;charset=UTF-8" );
Just for safety I also added
  response.setCharacterEncoding( "UTF-8" );

The created HTML text contains a meta tag
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Nevertheless: when calling the corresponding URL, all the special
characters are not displayed correctly in the browser (Firefox), when
using Tomcat 6. If I switch the encoding of the displayed page to
ISO-8859-1 in Firefox the characters are displayed correctly. That is:
it seems to me that everything is okay with the servlet, except that the
used encoding for the response is ISO-8859-1 instead of UTF-8.

When using Tomcat 5.5 everything is displayed correctly as UTF-8. Java
Server Pages do _not_ show similar behaviour.

Has anyone experienced similar problems?

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org