You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Allistair Crossley <Al...@QAS.com> on 2004/09/01 10:50:11 UTC

RE: ++ Best practive ?? ++ (JSP-->Servlet-->Database) character encoding.

We had to look at several areas:

1. JSP pageEncoding

<%@ page contentType="text/html; charset=UTF-8" %>

This ensures that the JSPs will display pretty much everything. Actually, our SQL Server database runs Latin1_General_CI_AS (which does include euro). 
2. Database Connection URL

jdbc:jtds:sqlserver://intratestgbr:1433/db_iQ;charset=Cp1252;TDS=7.0

We discovered that we _had_ to talk to the database using an encoding it understood. It turned out that Cp1252 was actually Latin1_General_CI_AS, so we make sure the character encoding is set on our database driver.

3. Request Character Encoding

Taken from http://weblogs.java.net/pub/wlg/1078

Submitting information via a HTML form. Most browsers don't appear to send back a charset in the request that corresponds to the encoding that was used to format the page. In this case, the request character encoding defaults to ISO-8859-1 meaning that there's potentially a mismatch between form data being sent (in UTF-8) and information retrieved from the request (in ISO-8859-1) using the getParameter() method on the HttpServletRequest class. To fix this, all you need to do is explicitly set the character encoding of the request before accessing data. 

request.setCharacterEncoding("UTF-8");

This is what the filter code I sent you does for all requests.

I hope this clears up your issue!

Alles gut, ich wuensche Dir Glueck!

ADC.


> -----Original Message-----
> From: Ben Bookey [mailto:ben.bookey@gistec-online.de]
> Sent: 01 September 2004 09:37
> To: Allistair Crossley
> Cc: Tomcat User List
> Subject: How to pre-determine the browser request character encoding
> type
> 
> 
> Hi Alistair,
> 
> I hope you find time to do your work.... more questions :)
> 
> Why should the IE client which is definitely reading/parsing as
> ISO-8859-15(i can see this in the IE menu bar), then post to 
> the server
> converting the Euro to a questionmark . its rather stupid of 
> IE isn't it,
> its definitely reading as ISO-8859-15 then posts anyway as 
> ISO-8859-1 ?
> 
> Could you explain in simple english, how the filter ensures 
> that the request
> is in utf8 encoded.


> -----Original Message-----
> From: Ben Bookey [mailto:ben.bookey@gistec-online.de]
> Sent: 01 September 2004 09:37
> To: Tomcat User List
> Cc: Allistair Crossley
> Subject: ++ Best practive ?? ++ (JSP-->Servlet-->Database) character
> encoding.
> 
> 
> 
> Dear list,
> 
> We have a web-based jsp-servlet application performing 
> updates, deletes and
> inserts into an oracle database running with Tomcat 5. We 
> want to support
> both
> american, and european customer client locales, so we want to 
> use either
> ISO-8859-15 or utf-8. But we are having problems saving the 
> Euro symbol when
> using ISO-8859-15 encoding.
> 
> I had previously assumed that because java works with unicode 
> as default,
> that all data entered in a HTML form would be saved therefore 
> as UTF-8 into
> the database. (i.e. as soon as a value is assigned to  the a 
> java dataobject
> e.g. string or int). I am beginning to think this not to be 
> case, and that
> all data is saved in the database based on the original 
> encoding as posted
> by the browser. Please can someone explain what is really 
> going on?? Do i
> need to have some code which, checks the browser encoding in the HTTP
> header, and then convert/parse accordingly to a chosen 
> standard. This will
> then avoid the situation that our database could end up 
> containing records
> in different character encoding systems, which I suspect is 
> what is now
> happening.
> 
> In addition, how does TC deal with framsets containing many 
> html pages. Are
> they all treated individually (in theory allowing many 
> character encodings
> to be used in each HTML frame), or as one unit.
> 
> I LOOK very much forward to any reply on this matter.
> 
> Sincerely,
> 
> 
> Ben Bookey
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
> 
> 


<FONT SIZE=1 FACE="VERDANA,ARIAL" COLOR=BLUE> 
-------------------------------------------------------
QAS Ltd.
Developers of QuickAddress Software
<a href="http://www.qas.com">www.qas.com</a>
Registered in England: No 2582055
Registered in Australia: No 082 851 474
-------------------------------------------------------
</FONT>


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


AW: ++ Best practive ?? ++ (JSP-->Servlet-->Database) character encoding.

Posted by Ben Bookey <be...@gistec-online.de>.
Hi Allistair,

I have installed your filter and it seems to be working, and utf-8 is
processed on the server.


	<%@ page language="java" errorPage="MainIdentificationMainError.jsp"%>
	<%@ page
import="java.lang.*,java.io.*,java.sql.*,javax.naming.*,javax.sql.*,java.uti
l.Enumeration,java.util.*"%>
	<%@ page import="com.gistec.webentrytool.
*,org.apache.torque.om.BaseObject,org.apache.torque.Torque,org.apache.torque
.TorqueException,org.apache.torque.TorqueRuntimeException,org.apache.torque
.util.Criteria,java.sql.*"%>
	<%@ page session="true"%>
	<%@ page contentType="text/html;charset=utf-8"%>
	<%@ page pageEncoding="utf-8"%>

But even when I use the above header in my jsp page, the euro symbol is
displayed incorrectly (with a utf-8 display)

I would appreciate any support.

regards

Ben









-----Ursprüngliche Nachricht-----
Von: Allistair Crossley [mailto:Allistair.Crossley@QAS.com]
Gesendet: Mittwoch, 1. September 2004 10:50
An: Tomcat Users List; ben.bookey@gistec-online.de
Betreff: RE: ++ Best practive ?? ++ (JSP-->Servlet-->Database) character
encoding.


We had to look at several areas:

1. JSP pageEncoding

<%@ page contentType="text/html; charset=UTF-8" %>

This ensures that the JSPs will display pretty much everything. Actually,
our SQL Server database runs Latin1_General_CI_AS (which does include euro).
2. Database Connection URL

jdbc:jtds:sqlserver://intratestgbr:1433/db_iQ;charset=Cp1252;TDS=7.0

We discovered that we _had_ to talk to the database using an encoding it
understood. It turned out that Cp1252 was actually Latin1_General_CI_AS, so
we make sure the character encoding is set on our database driver.

3. Request Character Encoding

Taken from http://weblogs.java.net/pub/wlg/1078

Submitting information via a HTML form. Most browsers don't appear to send
back a charset in the request that corresponds to the encoding that was used
to format the page. In this case, the request character encoding defaults to
ISO-8859-1 meaning that there's potentially a mismatch between form data
being sent (in UTF-8) and information retrieved from the request (in
ISO-8859-1) using the getParameter() method on the HttpServletRequest class.
To fix this, all you need to do is explicitly set the character encoding of
the request before accessing data.

request.setCharacterEncoding("UTF-8");

This is what the filter code I sent you does for all requests.

I hope this clears up your issue!

Alles gut, ich wuensche Dir Glueck!

ADC.


> -----Original Message-----
> From: Ben Bookey [mailto:ben.bookey@gistec-online.de]
> Sent: 01 September 2004 09:37
> To: Allistair Crossley
> Cc: Tomcat User List
> Subject: How to pre-determine the browser request character encoding
> type
>
>
> Hi Alistair,
>
> I hope you find time to do your work.... more questions :)
>
> Why should the IE client which is definitely reading/parsing as
> ISO-8859-15(i can see this in the IE menu bar), then post to
> the server
> converting the Euro to a questionmark . its rather stupid of
> IE isn't it,
> its definitely reading as ISO-8859-15 then posts anyway as
> ISO-8859-1 ?
>
> Could you explain in simple english, how the filter ensures
> that the request
> is in utf8 encoded.


> -----Original Message-----
> From: Ben Bookey [mailto:ben.bookey@gistec-online.de]
> Sent: 01 September 2004 09:37
> To: Tomcat User List
> Cc: Allistair Crossley
> Subject: ++ Best practive ?? ++ (JSP-->Servlet-->Database) character
> encoding.
>
>
>
> Dear list,
>
> We have a web-based jsp-servlet application performing
> updates, deletes and
> inserts into an oracle database running with Tomcat 5. We
> want to support
> both
> american, and european customer client locales, so we want to
> use either
> ISO-8859-15 or utf-8. But we are having problems saving the
> Euro symbol when
> using ISO-8859-15 encoding.
>
> I had previously assumed that because java works with unicode
> as default,
> that all data entered in a HTML form would be saved therefore
> as UTF-8 into
> the database. (i.e. as soon as a value is assigned to  the a
> java dataobject
> e.g. string or int). I am beginning to think this not to be
> case, and that
> all data is saved in the database based on the original
> encoding as posted
> by the browser. Please can someone explain what is really
> going on?? Do i
> need to have some code which, checks the browser encoding in the HTTP
> header, and then convert/parse accordingly to a chosen
> standard. This will
> then avoid the situation that our database could end up
> containing records
> in different character encoding systems, which I suspect is
> what is now
> happening.
>
> In addition, how does TC deal with framsets containing many
> html pages. Are
> they all treated individually (in theory allowing many
> character encodings
> to be used in each HTML frame), or as one unit.
>
> I LOOK very much forward to any reply on this matter.
>
> Sincerely,
>
>
> Ben Bookey
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>
>


<FONT SIZE=1 FACE="VERDANA,ARIAL" COLOR=BLUE>
-------------------------------------------------------
QAS Ltd.
Developers of QuickAddress Software
<a href="http://www.qas.com">www.qas.com</a>
Registered in England: No 2582055
Registered in Australia: No 082 851 474
-------------------------------------------------------
</FONT>


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org