You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Fadwa Barham <fs...@yahoo.com> on 2005/05/11 09:55:01 UTC

Re: Arabic encoding

While I was searching for a solution for the encoding, I found this
 
There is a standard for encoding URIs (http://www.w3.org/International/O-URL-
code.html) but this standard is not consistently followed by clients. This 
causes a number of problems.

The functionality provided by Tomcat (4 and 5) to handle this less than ideal 
situation is described below.

1. The Coyote HTTP/1.1 connector has a useBodyEncodingForURI attribute which 
if set to true will use the request body encoding to decode the URI query 
parameters.
  - The default value is true for TC4 (breaks spec but gives consistent 
behaviour across TC4 versions)
  - The default value is false for TC5 (spec compliant but there may be 
migration issues for some apps)
2. The Coyote HTTP/1.1 connector has a URIEncoding attribute which defaults to 
ISO-8859-1.
3. The parameters class (o.a.t.u.http.Parameters) has a QueryStringEncoding 
field which defaults to the URIEncoding. It must be set before the parameters 
are parsed to have an effect.

Things to note regarding the servlet API:
1. HttpServletRequest.setCharacterEncoding() normally only applies to the 
request body NOT the URI.
2. HttpServletRequest.getPathInfo() is decoded by the web container.
3. HttpServletRequest.getRequestURI() is not decoded by container.

Other tips:
1. Use POST with forms to return parameters as the parameters are then part of 
the request body.


Is this means that the changes between tc4 and tc5 about encoding is the reason why I can't have the write encoding in the new versions of tomcat? and if so, how to solve the problem?
 
Thanks

----- Original Message ----- 
From: "Fadwa Barham" 
To: "Tomcat Users List" 
Sent: Tuesday, March 01, 2005 3:24 AM
Subject: Re: Arabic encoding


> As tomcat 4.1.31 is suitable for arabic and it seems until now that tomcat 
> 4.1.31 solved the jndi datasource problems: Intermittent dB connection 
> Failures and Random Connection closed Exceptions
> I will use tomcat 4.1.31 until I can configure the latest versions of 
> tomcat.
> I feel not lucky
> ----- Original Message ----- 
> From: "Fadwa Barham" 
> To: "Tomcat Users List" 
> Sent: Tuesday, March 01, 2005 2:39 AM
> Subject: Re: Arabic encoding
>
>
>>I tested many tomcat versions, I found until tomcat 4.1.31 no problems 
>>with arabic, but when I tried tomcat-4.1.18 and newer versions, I faced 
>>the same problem.
>>
>> ----- Original Message ----- 
>> From: "Benson Margulies" 
>> To: "Tomcat Users List" 
>> Sent: Sunday, February 27, 2005 4:08 PM
>> Subject: RE: Arabic encoding
>>
>>
>>> It depends on what the Oracle JDBC driver does with byte values that are
>>> not legitimate US7ASCII. If, for some reason, it treated the data as
>>> ISO-8859-1 instead of US7ASCII, then it might have streamed out through
>>> tomcat, and the browser would have auto-detected the CP1256 pretending
>>> to be ISO-8859-1.
>>>
>>> -----Original Message-----
>>> From: Fadwa Barham [mailto:fadwa@najah.edu]
>>> Sent: Sunday, February 27, 2005 1:43 PM
>>> To: Tomcat Users List
>>> Subject: Re: Arabic encoding
>>>
>>> But I wonder why the old tomcat and java displayed arabic correctly, and
>>> I use the same classes12.jar in both of the old and the new.
>>> I want to know what is the differance, what encoding they stopped to
>>> support? It looks like that tomcat cannot understand the old Java cause
>>> I have to change the encoding to arabic windows in the internet explorer
>>> each time I request the servlet, and when I do this, every arabic
>>> character is displayed correctly.
>>> I think it is better to understand the problem and the changes so I can
>>> handle the problem if I faced it again in the newer versions of tomcat
>>> or Java.
>>> I know that being the database in us7ascii is not good, but changing the
>>> database encoding each time I face the problem is not the right way. I
>>> may change it this time, but I need to understand.
>>> thanks
>>>
>>> ----- Original Message -----
>>> From: "Benson Margulies" 
>>> To: "Tomcat Users List" 
>>> Sent: Sunday, February 27, 2005 12:44 AM
>>> Subject: RE: Arabic encoding
>>>
>>>
>>>> Oracle's ODBC driver will transcode from the database to UTF-16 based
>>> on
>>>> the databse encoding. If the database is in US7ASCII, this is a
>>>> destructive process for Arabic. The only alternative I can think of is
>>>> to do all your database I/O in hex.
>>>>
>>>> -----Original Message-----
>>>> From: Fadwa Barham [mailto:fadwa@najah.edu]
>>>> Sent: Saturday, February 26, 2005 1:20 PM
>>>> To: Tomcat Users List
>>>> Subject: Re: Arabic encoding
>>>>
>>>> I use oracle 7 database, and the NLS language is
>>>> American_America.US7ASCII, and it is not easy to change it to utf-8.
>>>> Beside, the question is, a servlet work fine on tomcat 4.0.6 why it
>>>> stopped with the new versions, what changes made to the encoding of
>>>> tomcat??
>>>> do I need tomcat-i18n-ar.jar? and if so, from where to get it?
>>>> I can't determine where is the problem, is it from the new Java or the
>>>> new tomcat.
>>>> thanks in advanced
>>>>
>>>> ----- Original Message -----
>>>> From: "Benson Margulies" 
>>>> To: "Tomcat Users List" 
>>>> Sent: Wednesday, February 23, 2005 11:26 PM
>>>> Subject: RE: Arabic encoding
>>>>
>>>>
>>>>> What database? Do you have the database set up to deliver Unicode, or
>>>>> CP1256, correctly? Note that not all Arabic fits into CP1256, you
>>>> might
>>>>> really be better off with UTF-8.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org 


		
---------------------------------
Yahoo! Mail Mobile
 Take Yahoo! Mail with you! Check email on your mobile phone.

Re: Arabic encoding

Posted by Fadwa Barham <fs...@yahoo.com>.
I tried what you suggested, the characters that I write inside the servlet will be displayed correctly, but the data from database are question marks.
it is true that remote debugging is not easy, but many programmers face this problem who use languages other than english,french,espanol. I found subjects about encoding in the bug database. but I could't have clear answer.
 
 


Mark Thomas <ma...@apache.org> wrote:
>From re-reading this thread it sounds as if an invalid assumption is 
being made somewhere about the encoding of your database data.

I would suggest the following:
1. Use res.setContentType("text/html; charset=UTF-8") or
res.setContentType("text/html; charset=windows-1256") in your servlets.

2. Write a simple (one character) test case for reading your database 
data and debug what is going on. At some point there will be something 
like a getReader(), getWriter() or similar that doesn't specify an 
encoding and this will be the problem.

Personally I find remote debugging invaluable in cases such as this so I 
can be sure I am seeing the real data.

Mark

Fadwa Barham wrote:
> thanks for your reply.
> 
> I agree with you that utf-8 encoding is suitable for all cases, but in tc4 with jdk1.3, I write the servlets and compile them and use data from oracle with us7ascii encoding, and I don't set any encoding except:
> pw.println("");
> pw.println("");
> and the page display all the characters correctly.
> I think sun microsystems and tomcat made changes to the new packages about encoding.
> but how to deal with the new changed? Is there special setup I've to do?
> 
> thanks 
> Fadwa
> 
> 
> Mark Thomas wrote:
> There are lots of potential pitfalls when using non-default character 
> encodings. It is easy to make mistakes both with Tomcat settings and 
> with your code.
> 
> To sort out the tomcat settings, get the following index.jsp to work for 
> whatever text you supply to the form. I have tested this with the latest 
> TC4 and TC5 code and it works for me with any text I choose to enter.
> 
> Once you have this working, you can look at your application and see 
> what is different.
> 
> Mark
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Data posted to this form was:
> request.setCharacterEncoding("UTF-8");
> out.print(request.getParameter("mydata"));
> %>
> 
> 
> 
> enctype="application/x-www-form-urlencoded">
> [input] 
> [input] 
> [input] 
> 
> 
> 
> 
> Fadwa Barham wrote:
> 
>>While I was searching for a solution for the encoding, I found this
>>
>>There is a standard for encoding URIs (http://www.w3.org/International/O-URL-
>>code.html) but this standard is not consistently followed by clients. This 
>>causes a number of problems.
>>
>>The functionality provided by Tomcat (4 and 5) to handle this less than ideal 
>>situation is described below.
>>
>>1. The Coyote HTTP/1.1 connector has a useBodyEncodingForURI attribute which 
>>if set to true will use the request body encoding to decode the URI query 
>>parameters.
>>- The default value is true for TC4 (breaks spec but gives consistent 
>>behaviour across TC4 versions)
>>- The default value is false for TC5 (spec compliant but there may be 
>>migration issues for some apps)
>>2. The Coyote HTTP/1.1 connector has a URIEncoding attribute which defaults to 
>>ISO-8859-1.
>>3. The parameters class (o.a.t.u.http.Parameters) has a QueryStringEncoding 
>>field which defaults to the URIEncoding. It must be set before the parameters 
>>are parsed to have an effect.
>>
>>Things to note regarding the servlet API:
>>1. HttpServletRequest.setCharacterEncoding() normally only applies to the 
>>request body NOT the URI.
>>2. HttpServletRequest.getPathInfo() is decoded by the web container.
>>3. HttpServletRequest.getRequestURI() is not decoded by container.
>>
>>Other tips:
>>1. Use POST with forms to return parameters as the parameters are then part of 
>>the request body.
>>
>>
>>Is this means that the changes between tc4 and tc5 about encoding is the reason why I can't have the write encoding in the new versions of tomcat? and if so, how to solve the problem?
>>
>>Thanks
>>
>>----- Original Message ----- 
>>From: "Fadwa Barham" 
>>To: "Tomcat Users List" 
>>Sent: Tuesday, March 01, 2005 3:24 AM
>>Subject: Re: Arabic encoding
>>
>>
>>
>>
>>>As tomcat 4.1.31 is suitable for arabic and it seems until now that tomcat 
>>>4.1.31 solved the jndi datasource problems: Intermittent dB connection 
>>>Failures and Random Connection closed Exceptions
>>>I will use tomcat 4.1.31 until I can configure the latest versions of 
>>>tomcat.
>>>I feel not lucky
>>>----- Original Message ----- 
>>>From: "Fadwa Barham" 
>>>To: "Tomcat Users List" 
>>>Sent: Tuesday, March 01, 2005 2:39 AM
>>>Subject: Re: Arabic encoding
>>>
>>>
>>>
>>>
>>>>I tested many tomcat versions, I found until tomcat 4.1.31 no problems 
>>>>with arabic, but when I tried tomcat-4.1.18 and newer versions, I faced 
>>>>the same problem.
>>>>
>>>>----- Original Message ----- 
>>>>From: "Benson Margulies" 
>>>>To: "Tomcat Users List" 
>>>>Sent: Sunday, February 27, 2005 4:08 PM
>>>>Subject: RE: Arabic encoding
>>>>
>>>>
>>>>
>>>>
>>>>>It depends on what the Oracle JDBC driver does with byte values that are
>>>>>not legitimate US7ASCII. If, for some reason, it treated the data as
>>>>>ISO-8859-1 instead of US7ASCII, then it might have streamed out through
>>>>>tomcat, and the browser would have auto-detected the CP1256 pretending
>>>>>to be ISO-8859-1.
>>>>>
>>>>>-----Original Message-----
>>>>>From: Fadwa Barham [mailto:fadwa@najah.edu]
>>>>>Sent: Sunday, February 27, 2005 1:43 PM
>>>>>To: Tomcat Users List
>>>>>Subject: Re: Arabic encoding
>>>>>
>>>>>But I wonder why the old tomcat and java displayed arabic correctly, and
>>>>>I use the same classes12.jar in both of the old and the new.
>>>>>I want to know what is the differance, what encoding they stopped to
>>>>>support? It looks like that tomcat cannot understand the old Java cause
>>>>>I have to change the encoding to arabic windows in the internet explorer
>>>>>each time I request the servlet, and when I do this, every arabic
>>>>>character is displayed correctly.
>>>>>I think it is better to understand the problem and the changes so I can
>>>>>handle the problem if I faced it again in the newer versions of tomcat
>>>>>or Java.
>>>>>I know that being the database in us7ascii is not good, but changing the
>>>>>database encoding each time I face the problem is not the right way. I
>>>>>may change it this time, but I need to understand.
>>>>>thanks
>>>>>
>>>>>----- Original Message -----
>>>>>From: "Benson Margulies" 
>>>>>To: "Tomcat Users List" 
>>>>>Sent: Sunday, February 27, 2005 12:44 AM
>>>>>Subject: RE: Arabic encoding
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Oracle's ODBC driver will transcode from the database to UTF-16 based
>>>>>
>>>>>on
>>>>>
>>>>>
>>>>>>the databse encoding. If the database is in US7ASCII, this is a
>>>>>>destructive process for Arabic. The only alternative I can think of is
>>>>>>to do all your database I/O in hex.
>>>>>>
>>>>>>-----Original Message-----
>>>>>>From: Fadwa Barham [mailto:fadwa@najah.edu]
>>>>>>Sent: Saturday, February 26, 2005 1:20 PM
>>>>>>To: Tomcat Users List
>>>>>>Subject: Re: Arabic encoding
>>>>>>
>>>>>>I use oracle 7 database, and the NLS language is
>>>>>>American_America.US7ASCII, and it is not easy to change it to utf-8.
>>>>>>Beside, the question is, a servlet work fine on tomcat 4.0.6 why it
>>>>>>stopped with the new versions, what changes made to the encoding of
>>>>>>tomcat??
>>>>>>do I need tomcat-i18n-ar.jar? and if so, from where to get it?
>>>>>>I can't determine where is the problem, is it from the new Java or the
>>>>>>new tomcat.
>>>>>>thanks in advanced
>>>>>>
>>>>>>----- Original Message -----
>>>>>>From: "Benson Margulies" 
>>>>>>To: "Tomcat Users List" 
>>>>>>Sent: Wednesday, February 23, 2005 11:26 PM
>>>>>>Subject: RE: Arabic encoding
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>What database? Do you have the database set up to deliver Unicode, or
>>>>>>>CP1256, correctly? Note that not all Arabic fits into CP1256, you
>>>>>>
>>>>>>might
>>>>>>
>>>>>>
>>>>>>>really be better off with UTF-8.
>>>>>>>
>>>>>>>---------------------------------------------------------------------
>>>>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>>>
>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>>>
>>>>>
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>>
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>>
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org 
>>
>>
>>
>>
>>---------------------------------
>>Yahoo! Mail Mobile
>>Take Yahoo! Mail with you! Check email on your mobile phone.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


		
---------------------------------
Discover Yahoo!
 Use Yahoo! to plan a weekend, have fun online & more.  Check it out!

Re: Arabic encoding

Posted by Mark Thomas <ma...@apache.org>.
 From re-reading this thread it sounds as if an invalid assumption is 
being made somewhere about the encoding of your database data.

I would suggest the following:
1. Use res.setContentType("text/html; charset=UTF-8") or
res.setContentType("text/html; charset=windows-1256") in your servlets.

2. Write a simple (one character) test case for reading your database 
data and debug what is going on. At some point there will be something 
like a getReader(), getWriter() or similar that doesn't specify an 
encoding and this will be the problem.

Personally I find remote debugging invaluable in cases such as this so I 
can be sure I am seeing the real data.

Mark

Fadwa Barham wrote:
> thanks for your reply.
>  
> I agree with you that utf-8 encoding is suitable for all cases, but in tc4 with jdk1.3, I write the servlets and compile them and use data from oracle with us7ascii encoding, and I don't set any encoding except:
>   pw.println("<meta http-equiv=\"Content-Language\" content=\"ar-sa\">");
>   pw.println("<META http-equiv=Content-Type content=\"text/html;charset=windows-1256\">");
> and the page display all the characters correctly.
> I think sun microsystems and tomcat made changes to the new packages about encoding.
> but how to deal with the new changed? Is there special setup I've to do?
>  
> thanks 
> Fadwa
> 
> 
> Mark Thomas <ma...@apache.org> wrote:
> There are lots of potential pitfalls when using non-default character 
> encodings. It is easy to make mistakes both with Tomcat settings and 
> with your code.
> 
> To sort out the tomcat settings, get the following index.jsp to work for 
> whatever text you supply to the form. I have tested this with the latest 
> TC4 and TC5 code and it works for me with any text I choose to enter.
> 
> Once you have this working, you can look at your application and see 
> what is different.
> 
> Mark
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Data posted to this form was:
>        request.setCharacterEncoding("UTF-8");
>        out.print(request.getParameter("mydata"));
>      %>
> 
> 
> 
> enctype="application/x-www-form-urlencoded">
>  [input] 
>  [input] 
>  [input] 
> 
> 
> 
> 
> Fadwa Barham wrote:
> 
>>While I was searching for a solution for the encoding, I found this
>>
>>There is a standard for encoding URIs (http://www.w3.org/International/O-URL-
>>code.html) but this standard is not consistently followed by clients. This 
>>causes a number of problems.
>>
>>The functionality provided by Tomcat (4 and 5) to handle this less than ideal 
>>situation is described below.
>>
>>1. The Coyote HTTP/1.1 connector has a useBodyEncodingForURI attribute which 
>>if set to true will use the request body encoding to decode the URI query 
>>parameters.
>>- The default value is true for TC4 (breaks spec but gives consistent 
>>behaviour across TC4 versions)
>>- The default value is false for TC5 (spec compliant but there may be 
>>migration issues for some apps)
>>2. The Coyote HTTP/1.1 connector has a URIEncoding attribute which defaults to 
>>ISO-8859-1.
>>3. The parameters class (o.a.t.u.http.Parameters) has a QueryStringEncoding 
>>field which defaults to the URIEncoding. It must be set before the parameters 
>>are parsed to have an effect.
>>
>>Things to note regarding the servlet API:
>>1. HttpServletRequest.setCharacterEncoding() normally only applies to the 
>>request body NOT the URI.
>>2. HttpServletRequest.getPathInfo() is decoded by the web container.
>>3. HttpServletRequest.getRequestURI() is not decoded by container.
>>
>>Other tips:
>>1. Use POST with forms to return parameters as the parameters are then part of 
>>the request body.
>>
>>
>>Is this means that the changes between tc4 and tc5 about encoding is the reason why I can't have the write encoding in the new versions of tomcat? and if so, how to solve the problem?
>>
>>Thanks
>>
>>----- Original Message ----- 
>>From: "Fadwa Barham" 
>>To: "Tomcat Users List" 
>>Sent: Tuesday, March 01, 2005 3:24 AM
>>Subject: Re: Arabic encoding
>>
>>
>>
>>
>>>As tomcat 4.1.31 is suitable for arabic and it seems until now that tomcat 
>>>4.1.31 solved the jndi datasource problems: Intermittent dB connection 
>>>Failures and Random Connection closed Exceptions
>>>I will use tomcat 4.1.31 until I can configure the latest versions of 
>>>tomcat.
>>>I feel not lucky
>>>----- Original Message ----- 
>>>From: "Fadwa Barham" 
>>>To: "Tomcat Users List" 
>>>Sent: Tuesday, March 01, 2005 2:39 AM
>>>Subject: Re: Arabic encoding
>>>
>>>
>>>
>>>
>>>>I tested many tomcat versions, I found until tomcat 4.1.31 no problems 
>>>>with arabic, but when I tried tomcat-4.1.18 and newer versions, I faced 
>>>>the same problem.
>>>>
>>>>----- Original Message ----- 
>>>>From: "Benson Margulies" 
>>>>To: "Tomcat Users List" 
>>>>Sent: Sunday, February 27, 2005 4:08 PM
>>>>Subject: RE: Arabic encoding
>>>>
>>>>
>>>>
>>>>
>>>>>It depends on what the Oracle JDBC driver does with byte values that are
>>>>>not legitimate US7ASCII. If, for some reason, it treated the data as
>>>>>ISO-8859-1 instead of US7ASCII, then it might have streamed out through
>>>>>tomcat, and the browser would have auto-detected the CP1256 pretending
>>>>>to be ISO-8859-1.
>>>>>
>>>>>-----Original Message-----
>>>>>From: Fadwa Barham [mailto:fadwa@najah.edu]
>>>>>Sent: Sunday, February 27, 2005 1:43 PM
>>>>>To: Tomcat Users List
>>>>>Subject: Re: Arabic encoding
>>>>>
>>>>>But I wonder why the old tomcat and java displayed arabic correctly, and
>>>>>I use the same classes12.jar in both of the old and the new.
>>>>>I want to know what is the differance, what encoding they stopped to
>>>>>support? It looks like that tomcat cannot understand the old Java cause
>>>>>I have to change the encoding to arabic windows in the internet explorer
>>>>>each time I request the servlet, and when I do this, every arabic
>>>>>character is displayed correctly.
>>>>>I think it is better to understand the problem and the changes so I can
>>>>>handle the problem if I faced it again in the newer versions of tomcat
>>>>>or Java.
>>>>>I know that being the database in us7ascii is not good, but changing the
>>>>>database encoding each time I face the problem is not the right way. I
>>>>>may change it this time, but I need to understand.
>>>>>thanks
>>>>>
>>>>>----- Original Message -----
>>>>>From: "Benson Margulies" 
>>>>>To: "Tomcat Users List" 
>>>>>Sent: Sunday, February 27, 2005 12:44 AM
>>>>>Subject: RE: Arabic encoding
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Oracle's ODBC driver will transcode from the database to UTF-16 based
>>>>>
>>>>>on
>>>>>
>>>>>
>>>>>>the databse encoding. If the database is in US7ASCII, this is a
>>>>>>destructive process for Arabic. The only alternative I can think of is
>>>>>>to do all your database I/O in hex.
>>>>>>
>>>>>>-----Original Message-----
>>>>>>From: Fadwa Barham [mailto:fadwa@najah.edu]
>>>>>>Sent: Saturday, February 26, 2005 1:20 PM
>>>>>>To: Tomcat Users List
>>>>>>Subject: Re: Arabic encoding
>>>>>>
>>>>>>I use oracle 7 database, and the NLS language is
>>>>>>American_America.US7ASCII, and it is not easy to change it to utf-8.
>>>>>>Beside, the question is, a servlet work fine on tomcat 4.0.6 why it
>>>>>>stopped with the new versions, what changes made to the encoding of
>>>>>>tomcat??
>>>>>>do I need tomcat-i18n-ar.jar? and if so, from where to get it?
>>>>>>I can't determine where is the problem, is it from the new Java or the
>>>>>>new tomcat.
>>>>>>thanks in advanced
>>>>>>
>>>>>>----- Original Message -----
>>>>>>From: "Benson Margulies" 
>>>>>>To: "Tomcat Users List" 
>>>>>>Sent: Wednesday, February 23, 2005 11:26 PM
>>>>>>Subject: RE: Arabic encoding
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>What database? Do you have the database set up to deliver Unicode, or
>>>>>>>CP1256, correctly? Note that not all Arabic fits into CP1256, you
>>>>>>
>>>>>>might
>>>>>>
>>>>>>
>>>>>>>really be better off with UTF-8.
>>>>>>>
>>>>>>>---------------------------------------------------------------------
>>>>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>>>
>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>>>
>>>>>
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>>
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>>
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org 
>>
>>
>>
>>
>>---------------------------------
>>Yahoo! Mail Mobile
>>Take Yahoo! Mail with you! Check email on your mobile phone.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


Re: Arabic encoding

Posted by Fadwa Barham <fs...@yahoo.com>.
thanks for your reply.
 
I agree with you that utf-8 encoding is suitable for all cases, but in tc4 with jdk1.3, I write the servlets and compile them and use data from oracle with us7ascii encoding, and I don't set any encoding except:
  pw.println("<meta http-equiv=\"Content-Language\" content=\"ar-sa\">");
  pw.println("<META http-equiv=Content-Type content=\"text/html;charset=windows-1256\">");
and the page display all the characters correctly.
I think sun microsystems and tomcat made changes to the new packages about encoding.
but how to deal with the new changed? Is there special setup I've to do?
 
thanks 
Fadwa


Mark Thomas <ma...@apache.org> wrote:
There are lots of potential pitfalls when using non-default character 
encodings. It is easy to make mistakes both with Tomcat settings and 
with your code.

To sort out the tomcat settings, get the following index.jsp to work for 
whatever text you supply to the form. I have tested this with the latest 
TC4 and TC5 code and it works for me with any text I choose to enter.

Once you have this working, you can look at your application and see 
what is different.

Mark









Data posted to this form was:
       request.setCharacterEncoding("UTF-8");
       out.print(request.getParameter("mydata"));
     %>



enctype="application/x-www-form-urlencoded">
 [input] 
 [input] 
 [input] 




Fadwa Barham wrote:
> While I was searching for a solution for the encoding, I found this
> 
> There is a standard for encoding URIs (http://www.w3.org/International/O-URL-
> code.html) but this standard is not consistently followed by clients. This 
> causes a number of problems.
> 
> The functionality provided by Tomcat (4 and 5) to handle this less than ideal 
> situation is described below.
> 
> 1. The Coyote HTTP/1.1 connector has a useBodyEncodingForURI attribute which 
> if set to true will use the request body encoding to decode the URI query 
> parameters.
> - The default value is true for TC4 (breaks spec but gives consistent 
> behaviour across TC4 versions)
> - The default value is false for TC5 (spec compliant but there may be 
> migration issues for some apps)
> 2. The Coyote HTTP/1.1 connector has a URIEncoding attribute which defaults to 
> ISO-8859-1.
> 3. The parameters class (o.a.t.u.http.Parameters) has a QueryStringEncoding 
> field which defaults to the URIEncoding. It must be set before the parameters 
> are parsed to have an effect.
> 
> Things to note regarding the servlet API:
> 1. HttpServletRequest.setCharacterEncoding() normally only applies to the 
> request body NOT the URI.
> 2. HttpServletRequest.getPathInfo() is decoded by the web container.
> 3. HttpServletRequest.getRequestURI() is not decoded by container.
> 
> Other tips:
> 1. Use POST with forms to return parameters as the parameters are then part of 
> the request body.
> 
> 
> Is this means that the changes between tc4 and tc5 about encoding is the reason why I can't have the write encoding in the new versions of tomcat? and if so, how to solve the problem?
> 
> Thanks
> 
> ----- Original Message ----- 
> From: "Fadwa Barham" 
> To: "Tomcat Users List" 
> Sent: Tuesday, March 01, 2005 3:24 AM
> Subject: Re: Arabic encoding
> 
> 
> 
>>As tomcat 4.1.31 is suitable for arabic and it seems until now that tomcat 
>>4.1.31 solved the jndi datasource problems: Intermittent dB connection 
>>Failures and Random Connection closed Exceptions
>>I will use tomcat 4.1.31 until I can configure the latest versions of 
>>tomcat.
>>I feel not lucky
>>----- Original Message ----- 
>>From: "Fadwa Barham" 
>>To: "Tomcat Users List" 
>>Sent: Tuesday, March 01, 2005 2:39 AM
>>Subject: Re: Arabic encoding
>>
>>
>>
>>>I tested many tomcat versions, I found until tomcat 4.1.31 no problems 
>>>with arabic, but when I tried tomcat-4.1.18 and newer versions, I faced 
>>>the same problem.
>>>
>>>----- Original Message ----- 
>>>From: "Benson Margulies" 
>>>To: "Tomcat Users List" 
>>>Sent: Sunday, February 27, 2005 4:08 PM
>>>Subject: RE: Arabic encoding
>>>
>>>
>>>
>>>>It depends on what the Oracle JDBC driver does with byte values that are
>>>>not legitimate US7ASCII. If, for some reason, it treated the data as
>>>>ISO-8859-1 instead of US7ASCII, then it might have streamed out through
>>>>tomcat, and the browser would have auto-detected the CP1256 pretending
>>>>to be ISO-8859-1.
>>>>
>>>>-----Original Message-----
>>>>From: Fadwa Barham [mailto:fadwa@najah.edu]
>>>>Sent: Sunday, February 27, 2005 1:43 PM
>>>>To: Tomcat Users List
>>>>Subject: Re: Arabic encoding
>>>>
>>>>But I wonder why the old tomcat and java displayed arabic correctly, and
>>>>I use the same classes12.jar in both of the old and the new.
>>>>I want to know what is the differance, what encoding they stopped to
>>>>support? It looks like that tomcat cannot understand the old Java cause
>>>>I have to change the encoding to arabic windows in the internet explorer
>>>>each time I request the servlet, and when I do this, every arabic
>>>>character is displayed correctly.
>>>>I think it is better to understand the problem and the changes so I can
>>>>handle the problem if I faced it again in the newer versions of tomcat
>>>>or Java.
>>>>I know that being the database in us7ascii is not good, but changing the
>>>>database encoding each time I face the problem is not the right way. I
>>>>may change it this time, but I need to understand.
>>>>thanks
>>>>
>>>>----- Original Message -----
>>>>From: "Benson Margulies" 
>>>>To: "Tomcat Users List" 
>>>>Sent: Sunday, February 27, 2005 12:44 AM
>>>>Subject: RE: Arabic encoding
>>>>
>>>>
>>>>
>>>>>Oracle's ODBC driver will transcode from the database to UTF-16 based
>>>>
>>>>on
>>>>
>>>>>the databse encoding. If the database is in US7ASCII, this is a
>>>>>destructive process for Arabic. The only alternative I can think of is
>>>>>to do all your database I/O in hex.
>>>>>
>>>>>-----Original Message-----
>>>>>From: Fadwa Barham [mailto:fadwa@najah.edu]
>>>>>Sent: Saturday, February 26, 2005 1:20 PM
>>>>>To: Tomcat Users List
>>>>>Subject: Re: Arabic encoding
>>>>>
>>>>>I use oracle 7 database, and the NLS language is
>>>>>American_America.US7ASCII, and it is not easy to change it to utf-8.
>>>>>Beside, the question is, a servlet work fine on tomcat 4.0.6 why it
>>>>>stopped with the new versions, what changes made to the encoding of
>>>>>tomcat??
>>>>>do I need tomcat-i18n-ar.jar? and if so, from where to get it?
>>>>>I can't determine where is the problem, is it from the new Java or the
>>>>>new tomcat.
>>>>>thanks in advanced
>>>>>
>>>>>----- Original Message -----
>>>>>From: "Benson Margulies" 
>>>>>To: "Tomcat Users List" 
>>>>>Sent: Wednesday, February 23, 2005 11:26 PM
>>>>>Subject: RE: Arabic encoding
>>>>>
>>>>>
>>>>>
>>>>>>What database? Do you have the database set up to deliver Unicode, or
>>>>>>CP1256, correctly? Note that not all Arabic fits into CP1256, you
>>>>>
>>>>>might
>>>>>
>>>>>>really be better off with UTF-8.
>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>>>
>>>>>
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>>
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>>
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org 
> 
> 
> 
> 
> ---------------------------------
> Yahoo! Mail Mobile
> Take Yahoo! Mail with you! Check email on your mobile phone.


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: Arabic encoding

Posted by Mark Thomas <ma...@apache.org>.
There are lots of potential pitfalls when using non-default character 
encodings. It is easy to make mistakes both with Tomcat settings and 
with your code.

To sort out the tomcat settings, get the following index.jsp to work for 
whatever text you supply to the form. I have tested this with the latest 
TC4 and TC5 code and it works for me with any text I choose to enter.

Once you have this working, you can look at your application and see 
what is different.

Mark

<%@ page contentType="text/html; charset=UTF-8" %>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
   <head>
     <title>Encoding fun</title>
   </head>
   <body>
     <p>Data posted to this form was:
     <%
       request.setCharacterEncoding("UTF-8");
       out.print(request.getParameter("mydata"));
     %>

     </p>
     <form method="post" action="index.jsp"
           enctype="application/x-www-form-urlencoded">
       <input type="text" name="mydata">
       <input type="submit" value="Submit" />
       <input type="reset" value="Reset" />
     </form>
   </body>
</html>

Fadwa Barham wrote:
> While I was searching for a solution for the encoding, I found this
>  
> There is a standard for encoding URIs (http://www.w3.org/International/O-URL-
> code.html) but this standard is not consistently followed by clients. This 
> causes a number of problems.
> 
> The functionality provided by Tomcat (4 and 5) to handle this less than ideal 
> situation is described below.
> 
> 1. The Coyote HTTP/1.1 connector has a useBodyEncodingForURI attribute which 
> if set to true will use the request body encoding to decode the URI query 
> parameters.
>   - The default value is true for TC4 (breaks spec but gives consistent 
> behaviour across TC4 versions)
>   - The default value is false for TC5 (spec compliant but there may be 
> migration issues for some apps)
> 2. The Coyote HTTP/1.1 connector has a URIEncoding attribute which defaults to 
> ISO-8859-1.
> 3. The parameters class (o.a.t.u.http.Parameters) has a QueryStringEncoding 
> field which defaults to the URIEncoding. It must be set before the parameters 
> are parsed to have an effect.
> 
> Things to note regarding the servlet API:
> 1. HttpServletRequest.setCharacterEncoding() normally only applies to the 
> request body NOT the URI.
> 2. HttpServletRequest.getPathInfo() is decoded by the web container.
> 3. HttpServletRequest.getRequestURI() is not decoded by container.
> 
> Other tips:
> 1. Use POST with forms to return parameters as the parameters are then part of 
> the request body.
> 
> 
> Is this means that the changes between tc4 and tc5 about encoding is the reason why I can't have the write encoding in the new versions of tomcat? and if so, how to solve the problem?
>  
> Thanks
> 
> ----- Original Message ----- 
> From: "Fadwa Barham" 
> To: "Tomcat Users List" 
> Sent: Tuesday, March 01, 2005 3:24 AM
> Subject: Re: Arabic encoding
> 
> 
> 
>>As tomcat 4.1.31 is suitable for arabic and it seems until now that tomcat 
>>4.1.31 solved the jndi datasource problems: Intermittent dB connection 
>>Failures and Random Connection closed Exceptions
>>I will use tomcat 4.1.31 until I can configure the latest versions of 
>>tomcat.
>>I feel not lucky
>>----- Original Message ----- 
>>From: "Fadwa Barham" 
>>To: "Tomcat Users List" 
>>Sent: Tuesday, March 01, 2005 2:39 AM
>>Subject: Re: Arabic encoding
>>
>>
>>
>>>I tested many tomcat versions, I found until tomcat 4.1.31 no problems 
>>>with arabic, but when I tried tomcat-4.1.18 and newer versions, I faced 
>>>the same problem.
>>>
>>>----- Original Message ----- 
>>>From: "Benson Margulies" 
>>>To: "Tomcat Users List" 
>>>Sent: Sunday, February 27, 2005 4:08 PM
>>>Subject: RE: Arabic encoding
>>>
>>>
>>>
>>>>It depends on what the Oracle JDBC driver does with byte values that are
>>>>not legitimate US7ASCII. If, for some reason, it treated the data as
>>>>ISO-8859-1 instead of US7ASCII, then it might have streamed out through
>>>>tomcat, and the browser would have auto-detected the CP1256 pretending
>>>>to be ISO-8859-1.
>>>>
>>>>-----Original Message-----
>>>>From: Fadwa Barham [mailto:fadwa@najah.edu]
>>>>Sent: Sunday, February 27, 2005 1:43 PM
>>>>To: Tomcat Users List
>>>>Subject: Re: Arabic encoding
>>>>
>>>>But I wonder why the old tomcat and java displayed arabic correctly, and
>>>>I use the same classes12.jar in both of the old and the new.
>>>>I want to know what is the differance, what encoding they stopped to
>>>>support? It looks like that tomcat cannot understand the old Java cause
>>>>I have to change the encoding to arabic windows in the internet explorer
>>>>each time I request the servlet, and when I do this, every arabic
>>>>character is displayed correctly.
>>>>I think it is better to understand the problem and the changes so I can
>>>>handle the problem if I faced it again in the newer versions of tomcat
>>>>or Java.
>>>>I know that being the database in us7ascii is not good, but changing the
>>>>database encoding each time I face the problem is not the right way. I
>>>>may change it this time, but I need to understand.
>>>>thanks
>>>>
>>>>----- Original Message -----
>>>>From: "Benson Margulies" 
>>>>To: "Tomcat Users List" 
>>>>Sent: Sunday, February 27, 2005 12:44 AM
>>>>Subject: RE: Arabic encoding
>>>>
>>>>
>>>>
>>>>>Oracle's ODBC driver will transcode from the database to UTF-16 based
>>>>
>>>>on
>>>>
>>>>>the databse encoding. If the database is in US7ASCII, this is a
>>>>>destructive process for Arabic. The only alternative I can think of is
>>>>>to do all your database I/O in hex.
>>>>>
>>>>>-----Original Message-----
>>>>>From: Fadwa Barham [mailto:fadwa@najah.edu]
>>>>>Sent: Saturday, February 26, 2005 1:20 PM
>>>>>To: Tomcat Users List
>>>>>Subject: Re: Arabic encoding
>>>>>
>>>>>I use oracle 7 database, and the NLS language is
>>>>>American_America.US7ASCII, and it is not easy to change it to utf-8.
>>>>>Beside, the question is, a servlet work fine on tomcat 4.0.6 why it
>>>>>stopped with the new versions, what changes made to the encoding of
>>>>>tomcat??
>>>>>do I need tomcat-i18n-ar.jar? and if so, from where to get it?
>>>>>I can't determine where is the problem, is it from the new Java or the
>>>>>new tomcat.
>>>>>thanks in advanced
>>>>>
>>>>>----- Original Message -----
>>>>>From: "Benson Margulies" 
>>>>>To: "Tomcat Users List" 
>>>>>Sent: Wednesday, February 23, 2005 11:26 PM
>>>>>Subject: RE: Arabic encoding
>>>>>
>>>>>
>>>>>
>>>>>>What database? Do you have the database set up to deliver Unicode, or
>>>>>>CP1256, correctly? Note that not all Arabic fits into CP1256, you
>>>>>
>>>>>might
>>>>>
>>>>>>really be better off with UTF-8.
>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>>>
>>>>>
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>>
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>>
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>>>
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org 
> 
> 
> 
> 		
> ---------------------------------
> Yahoo! Mail Mobile
>  Take Yahoo! Mail with you! Check email on your mobile phone.


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org