You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@struts.apache.org by Larry Young <ly...@dalmatian.com> on 2004/07/21 21:05:29 UTC

character encoding

Hello,

         I've run into a bit of a problem and I'd like to know how others 
have solved it.

         It's basically a character encoding issue.  I post my struts-based 
JSP page to the user, they enter some data, and then submit the page back 
to my Action.  The data they enter may contain multi-byte characters.  If I 
pull the data out of the parameter list (using PropertyUtils), I'm getting 
a whole bunch of extra characters for each multi-byte character (ASCII 
works just fine).  If I set the encoding value in the Request to UTF-8 
before calling PropertyUtils, it seems to work great for non-Form data 
values.  However, the since the Form is populated before my Action is 
called, these String values have already been decoded and are wrong.

         I'd like to hear how others have solved this problem.  I can see 
that one solution is to replace the RequestProcessor and hardcode the 
"setEncoding" on the Request to UTF-8, or subclass the whole 
ActionServlet.  Are there any cleaner solutions?  I can't believe I'm the 
only one to have run across this problem!  I'm not THAT unlucky! :)

         Any help is most gratefully appreciated.

--- regards ---
Larry


--------------------------
Larry Young
The Dalmatian Group
www.dalmatian.com 



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org


Re[3]: character encoding

Posted by Carl-Eric Menzel <cm...@bitforce.com>.
> Then I'm out of luck. That's the biggest problem with Strut's lack of
> support for the accept-charset attribute. *Most of the time* it works
> that if you send the response in UTF-8 the next request will come in
> as UTF-8 too. That's what I'm doing now - I send out only UTF-8 forms
> and assume that I get the same back. It's an ugly hack, but the only
> way that seems to work at the moment.

> I asked a few weeks ago if there was any way for me to extend the form
> tag to support this attribute, or whether there is any good reason why
> it is not implemented. So far I haven't received an answer.

PS: While searching for a solution I found that the HTTP spec actually
provides the browsers with a way to *specify* the encoding they're
sending, which would completely solve this issue. It appears that the
only browser that supported it *was* Mozilla. They found out that this
extended (but conforming to the spec!) Content-Type header made so
many broken CGI-scripts puke that they removed this feature again.
*sigh*

Carl-Eric
-- 
Antwort: Weil es das Lesen des Textes erschwert.   | Carl-Eric Menzel
Frage  : Warum ist das so schlimm?                 | PGP ID: 808F4A8E
Antwort: Antworten oben zu schreiben.              | Bitte keine HTML-
Frage  : Was ist die schlimmste Unsitte in Emails? | Mails schicken.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org


Re[2]: character encoding

Posted by Carl-Eric Menzel <cm...@bitforce.com>.
> I have added an acceptCharset attribute to the FormTag.

> Should be available in the next nightly build - 22/07/2004

Hooray :) Thanks a lot, this is going to be very useful.

Carl-Eric
-- 
Antwort: Weil es das Lesen des Textes erschwert.   | Carl-Eric Menzel
Frage  : Warum ist das so schlimm?                 | PGP ID: 808F4A8E
Antwort: Antworten oben zu schreiben.              | Bitte keine HTML-
Frage  : Was ist die schlimmste Unsitte in Emails? | Mails schicken.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org


Re: character encoding

Posted by Niall Pemberton <ni...@blueyonder.co.uk>.
I have added an acceptCharset attribute to the FormTag.

<html:form action="abc.do" acceptCharset="UTF-8">

which will generate something as

<form action="abc.do" method="post" accept-charset="UTF-8">

Should be available in the next nightly build - 22/07/2004

Niall

----- Original Message ----- 
From: "Carl-Eric Menzel (bitFORCE media)" <cm...@bitforce.com>
To: "Struts Users Mailing List" <us...@struts.apache.org>
Cc: "Larry Young" <ly...@dalmatian.com>
Sent: Wednesday, July 21, 2004 8:36 PM
Subject: Re[2]: character encoding


> > Carl-Eric,
>
> >          Yes, I tried the charset on the form but found it didn't do any
good.
>
> >          But what do you force the Encoding to in your Filter?  How can
you
> > know with any certitude how the browser encoded the data values before
> > sending it to you??  It probably works well if the browser is setup to
> > "auto-select" the encoding, but what do you do if they have it
explicitly
> > set to something other than what you are assuming?
>
> Then I'm out of luck. That's the biggest problem with Strut's lack of
> support for the accept-charset attribute. *Most of the time* it works
> that if you send the response in UTF-8 the next request will come in
> as UTF-8 too. That's what I'm doing now - I send out only UTF-8 forms
> and assume that I get the same back. It's an ugly hack, but the only
> way that seems to work at the moment.
>
> I asked a few weeks ago if there was any way for me to extend the form
> tag to support this attribute, or whether there is any good reason why
> it is not implemented. So far I haven't received an answer.
>
> Carl-Eric
> -- 
> Antwort: Weil es das Lesen des Textes erschwert.   | Carl-Eric Menzel
> Frage  : Warum ist das so schlimm?                 | PGP ID: 808F4A8E
> Antwort: Antworten oben zu schreiben.              | Bitte keine HTML-
> Frage  : Was ist die schlimmste Unsitte in Emails? | Mails schicken.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
> For additional commands, e-mail: user-help@struts.apache.org
>
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org


Re[2]: character encoding

Posted by Carl-Eric Menzel <cm...@bitforce.com>.
> Carl-Eric,

>          Yes, I tried the charset on the form but found it didn't do any good.

>          But what do you force the Encoding to in your Filter?  How can you
> know with any certitude how the browser encoded the data values before
> sending it to you??  It probably works well if the browser is setup to
> "auto-select" the encoding, but what do you do if they have it explicitly
> set to something other than what you are assuming?

Then I'm out of luck. That's the biggest problem with Strut's lack of
support for the accept-charset attribute. *Most of the time* it works
that if you send the response in UTF-8 the next request will come in
as UTF-8 too. That's what I'm doing now - I send out only UTF-8 forms
and assume that I get the same back. It's an ugly hack, but the only
way that seems to work at the moment.

I asked a few weeks ago if there was any way for me to extend the form
tag to support this attribute, or whether there is any good reason why
it is not implemented. So far I haven't received an answer.

Carl-Eric
-- 
Antwort: Weil es das Lesen des Textes erschwert.   | Carl-Eric Menzel
Frage  : Warum ist das so schlimm?                 | PGP ID: 808F4A8E
Antwort: Antworten oben zu schreiben.              | Bitte keine HTML-
Frage  : Was ist die schlimmste Unsitte in Emails? | Mails schicken.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org


Re: character encoding

Posted by Larry Young <ly...@dalmatian.com>.
Carl-Eric,

         Yes, I tried the charset on the form but found it didn't do any good.

         But what do you force the Encoding to in your Filter?  How can you 
know with any certitude how the browser encoded the data values before 
sending it to you??  It probably works well if the browser is setup to 
"auto-select" the encoding, but what do you do if they have it explicitly 
set to something other than what you are assuming?

--- regards ---
Larry


At 12:07 PM 7/21/04, you wrote:

> >          I'd like to hear how others have solved this problem.  I can see
> > that one solution is to replace the RequestProcessor and hardcode the
> > "setEncoding" on the Request to UTF-8, or subclass the whole
> > ActionServlet.  Are there any cleaner solutions?  I can't believe I'm the
> > only one to have run across this problem!  I'm not THAT unlucky! :)
>
>I am using a Filter (from the Servlet API) that gets the request
>before anything else in the chain and calls setEncoding() on it before
>passing it on.
>
>What would be great, just to get a little more consistency into this,
>would be if the html:form-tag would finally support the accept-charset
>attribute as specified in HTML4.01.
>
>HTH
>Carl-Eric
>--
>Antwort: Weil es das Lesen des Textes erschwert.   | Carl-Eric Menzel
>Frage  : Warum ist das so schlimm?                 | PGP ID: 808F4A8E
>Antwort: Antworten oben zu schreiben.              | Bitte keine HTML-
>Frage  : Was ist die schlimmste Unsitte in Emails? | Mails schicken.
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
>For additional commands, e-mail: user-help@struts.apache.org

--------------------------
Larry Young
The Dalmatian Group
www.dalmatian.com 



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org


Re: character encoding

Posted by Carl-Eric Menzel <cm...@bitforce.com>.
>          I'd like to hear how others have solved this problem.  I can see
> that one solution is to replace the RequestProcessor and hardcode the
> "setEncoding" on the Request to UTF-8, or subclass the whole 
> ActionServlet.  Are there any cleaner solutions?  I can't believe I'm the
> only one to have run across this problem!  I'm not THAT unlucky! :)

I am using a Filter (from the Servlet API) that gets the request
before anything else in the chain and calls setEncoding() on it before
passing it on.

What would be great, just to get a little more consistency into this,
would be if the html:form-tag would finally support the accept-charset
attribute as specified in HTML4.01.

HTH
Carl-Eric
-- 
Antwort: Weil es das Lesen des Textes erschwert.   | Carl-Eric Menzel
Frage  : Warum ist das so schlimm?                 | PGP ID: 808F4A8E
Antwort: Antworten oben zu schreiben.              | Bitte keine HTML-
Frage  : Was ist die schlimmste Unsitte in Emails? | Mails schicken.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org