You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@turbine.apache.org by "Kundrot, Steven" <St...@parexel.com> on 2002/11/06 17:03:39 UTC

Character Sets/Character Encoding

I'm running into a problem with character sets. I have a user base in
France, and they have the ability to enter characters with accents etc.
When we try to store these in a database, they are corrupted.  The
corruption occurs during the parsing of the request.  Obviously, we must not
be using the correct character set or encoding.  If I debug, I see that the
Rundata has a charset of UTF-8 and the ParameterParser has a character
encoding of US-ASCII.  We attempted to set the "locale.default.charset"
property in TurbineResources.properties to ISO-8859-1 but there was no
change. The Rundata and ParameterParser still listed the same encodings.  

How can we get the correct character encoding.  Strangely enough, if there
is properly encoded information in the database, the display works fine, it
is when you try and submit a form to create or update that data, that the
characters don't convert correctly.

Any help would be appreciated.



Steven Kundrot


The information transmitted in this communication is intended only for the
person or entity to which it is addressed and may contain confidential
and/or privileged material. Any review, retransmission, dissemination or
other use of, or taking of any action in reliance upon, this information by
persons or entities other than the intended recipient is prohibited. If you
received this in error, please destroy any copies, contact the sender and
delete the material from any computer.

Re: Character Sets/Character Encoding

Posted by Rajesh Thiharie <ra...@ggn.aithent.com>.
Please use UTF-8 or UTF-16 to be completly safe.
At all places. Else there will be incompatibilities.

Kundrot, Steven wrote:

>I'm running into a problem with character sets. I have a user base in
>France, and they have the ability to enter characters with accents etc.
>When we try to store these in a database, they are corrupted.  The
>corruption occurs during the parsing of the request.  Obviously, we must not
>be using the correct character set or encoding.  If I debug, I see that the
>Rundata has a charset of UTF-8 and the ParameterParser has a character
>encoding of US-ASCII.  We attempted to set the "locale.default.charset"
>property in TurbineResources.properties to ISO-8859-1 but there was no
>change. The Rundata and ParameterParser still listed the same encodings.  
>
>How can we get the correct character encoding.  Strangely enough, if there
>is properly encoded information in the database, the display works fine, it
>is when you try and submit a form to create or update that data, that the
>characters don't convert correctly.
>
>Any help would be appreciated.
>
>
>
>Steven Kundrot
>
>
>The information transmitted in this communication is intended only for the
>person or entity to which it is addressed and may contain confidential
>and/or privileged material. Any review, retransmission, dissemination or
>other use of, or taking of any action in reliance upon, this information by
>persons or entities other than the intended recipient is prohibited. If you
>received this in error, please destroy any copies, contact the sender and
>delete the material from any computer.
>



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Character Sets/Character Encoding

Posted by Laurie Harper <zo...@holoweb.net>.
Remember to ensure you're using (and declaring) UTF-8 as the encoding of
every response if you go that route, and hope you don't run into any devices
which don't support it, or which support but refuse to submit it...

In this case, though, there is no need to go to UTF-8. The users are in
France and happy with ISO-8859-1. That's the only character set HTTP is
defined over, so if you can stick to it you'll have a lot less problems.
[You do need to be *sure* you don't need anything outside 8859-1, of
course!].

Assuming that's so, and you explicitly use 8859-1 to generate all responses,
you should be fine. The only other thing to watch out for is if you ever
call String.getBytes() or use an OutputStream, you must ensure that you
specify the correct encoding. That's true whether you're using UTF-8,
Latin-1, or whatever.

HTH,

L. 

On 11/6/02 12:29 PM, "Bart Selders" <ba...@ibanx.nl> wrote:

> We had the same issue with another system, and solved it by creating a
> filter (according to servlet 2.3 spec) that would change the character
> encoding of the request just before the actual servlet would be invoked.
> This works for Tomcat 4 and all other 2.3 based servlet containers. See
> also the JavaWorld article on filters.
> 
>  public void doFilter(ServletRequest request, ServletResponse response,
>               FilterChain chain) throws IOException, ServletException
>  {
>     if (request instanceof HttpServletRequest)
>     {
>        HttpServletRequest httpreq = (HttpServletRequest) request;
>        try
>        {
>            request.setCharacterEncoding("UTF-8");
>        }
>        catch (Exception e)
>        {
>            config.getServletContext().
>              log("Error setting UTF8 encoding : " + e.getMessage());
>        }
>      }
> 
>      // Perform any other filters that are chained after this one.
>      // This includes calling the requested servlet!
>      chain.doFilter(request, response);
>    }
>  }
> 
> Tomcat 3.3x has a special interceptor in server.xml that would do the same.
> 
>   <DecodeInterceptor DefaultEncoding="UTF-8" />
> 
> Notice that this solution is independent of Turbine.
> 
> Success,
> 
> Bart
> 
> Kundrot, Steven wrote:
>> I'm running into a problem with character sets. I have a user base in
>> France, and they have the ability to enter characters with accents etc.
>> When we try to store these in a database, they are corrupted.  The
>> corruption occurs during the parsing of the request.  Obviously, we must
>> not
>> be using the correct character set or encoding.  If I debug, I see that
>> the
>> Rundata has a charset of UTF-8 and the ParameterParser has a character
>> encoding of US-ASCII.  We attempted to set the "locale.default.charset"
>> property in TurbineResources.properties to ISO-8859-1 but there was no
>> change. The Rundata and ParameterParser still listed the same encodings.
>> 
>> 
>> How can we get the correct character encoding.  Strangely enough, if
>> there
>> is properly encoded information in the database, the display works fine,
>> it
>> is when you try and submit a form to create or update that data, that
>> the
>> characters don't convert correctly.
>> 
>> Any help would be appreciated.
>> 
>> 
>> 
>> Steven Kundrot
>> 
>> 
>> The information transmitted in this communication is intended only for
>> the
>> person or entity to which it is addressed and may contain confidential
>> and/or privileged material. Any review, retransmission, dissemination or
>> other use of, or taking of any action in reliance upon, this information
>> by
>> persons or entities other than the intended recipient is prohibited. If
>> you
>> received this in error, please destroy any copies, contact the sender
>> and
>> delete the material from any computer.
>> 
> 
> 
> 
> 
> *************************************************************************
> The information contained in this communication is confidential and is
> intended solely for the use of the individual or entity to  whom it is
> addressed.You should not copy, disclose or distribute this communication
> without the authority of iBanx bv. iBanx bv is neither liable for
> the proper and complete transmission of the information has been maintained
> nor that the communication is free of viruses, interceptions or interference.
> 
> If you are not the intended recipient of this communication please return
> the communication to the sender and delete and destroy all copies.
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
> 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Character Sets/Character Encoding

Posted by Bart Selders <ba...@ibanx.nl>.
We had the same issue with another system, and solved it by creating a 
filter (according to servlet 2.3 spec) that would change the character 
encoding of the request just before the actual servlet would be invoked.
This works for Tomcat 4 and all other 2.3 based servlet containers. See 
also the JavaWorld article on filters.

   public void doFilter(ServletRequest request, ServletResponse response,
                FilterChain chain) throws IOException, ServletException
   {
      if (request instanceof HttpServletRequest)
      {
         HttpServletRequest httpreq = (HttpServletRequest) request;
         try
         {
             request.setCharacterEncoding("UTF-8");
         }
         catch (Exception e)
         {
             config.getServletContext().
               log("Error setting UTF8 encoding : " + e.getMessage());
         }
       }

       // Perform any other filters that are chained after this one.
       // This includes calling the requested servlet!
       chain.doFilter(request, response);
     }
   }

Tomcat 3.3x has a special interceptor in server.xml that would do the same.

    <DecodeInterceptor DefaultEncoding="UTF-8" />

Notice that this solution is independent of Turbine.

Success,

Bart

Kundrot, Steven wrote:
> I'm running into a problem with character sets. I have a user base in
> France, and they have the ability to enter characters with accents etc.
> When we try to store these in a database, they are corrupted.  The
> corruption occurs during the parsing of the request.  Obviously, we must
> not
> be using the correct character set or encoding.  If I debug, I see that
> the
> Rundata has a charset of UTF-8 and the ParameterParser has a character
> encoding of US-ASCII.  We attempted to set the "locale.default.charset"
> property in TurbineResources.properties to ISO-8859-1 but there was no
> change. The Rundata and ParameterParser still listed the same encodings.
> 
> 
> How can we get the correct character encoding.  Strangely enough, if
> there
> is properly encoded information in the database, the display works fine,
> it
> is when you try and submit a form to create or update that data, that
> the
> characters don't convert correctly.
> 
> Any help would be appreciated.
> 
> 
> 
> Steven Kundrot
> 
> 
> The information transmitted in this communication is intended only for
> the
> person or entity to which it is addressed and may contain confidential
> and/or privileged material. Any review, retransmission, dissemination or
> other use of, or taking of any action in reliance upon, this information
> by
> persons or entities other than the intended recipient is prohibited. If
> you
> received this in error, please destroy any copies, contact the sender
> and
> delete the material from any computer.
> 




*************************************************************************
The information contained in this communication is confidential and is
intended solely for the use of the individual or entity to  whom it is
addressed.You should not copy, disclose or distribute this communication 
without the authority of iBanx bv. iBanx bv is neither liable for 
the proper and complete transmission of the information has been maintained
nor that the communication is free of viruses, interceptions or interference.

If you are not the intended recipient of this communication please return
the communication to the sender and delete and destroy all copies.

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>