You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Jeremy Quinn <je...@media.demon.co.uk> on 2003/03/03 19:05:33 UTC
form encoding problems
Hi All,
This is possibly a trivial mistake ... but I never came across it
before.
I have a search form for searching Lucene. Mozilla confirms the page is
in UTF-8 encoding.
I enter a string with accented characters into the query field. eg
'éclair' (e-acute).
The form comes back with the string now reading 'éclair'. (A-tilde,
Copyright sign). Mozilla says the encoding is still UTF-8. (The value
has been picked up by an InputModule and fed via the SiteMap to XSLT).
The query string in the URL reads 'query=%C3%A9clair', which are the
unicodes for 'A-tilde' and 'Copyright' characters. (Which would imply
to me that the Browser incorrectly encoded the query.)
This makes me feel like I have done something really dumb, but I cannot
work out what ;)
Incidentally, the search form in the Cocoon Samples does exactly the
same thing!!
Any suggestions?
regards Jeremy
Re: form encoding problems
Posted by Jeremy Quinn <je...@media.demon.co.uk>.
On Monday, March 3, 2003, at 06:57 PM, Konstantin Piroumian wrote:
> Take a look at i18n samples, particularly the XSP page
> (/samples/i18n/simple.xsp) and try to enter something like that in the
> input
> box there, then submit and see the 'Hello Tomcat' paragraph ('Tomcat'
> should
> be replaced by the entered string). If this works correctly then take
> a look
> at the source code of simple.xsp - maybe that's what you are looking
> for.
>
OK, your XSP is using the @form-encoding="UTF-8" attribute in the
<xsp-request:get-parameter/> tag.
I am retrieving request parameters using InputModules.
I wonder what the Request InputModules do about form encoding?
regards Jeremy
Re: form encoding problems
Posted by Jeremy Quinn <je...@media.demon.co.uk>.
On Monday, March 3, 2003, at 06:57 PM, Konstantin Piroumian wrote:
> Hi!
>
> Take a look at i18n samples, particularly the XSP page
> (/samples/i18n/simple.xsp) and try to enter something like that in the
> input
> box there, then submit and see the 'Hello Tomcat' paragraph ('Tomcat'
> should
> be replaced by the entered string). If this works correctly then take
> a look
> at the source code of simple.xsp - maybe that's what you are looking
> for.
OK, thats a lead.
The browser is still doing it's funny encoding, but you are handling it
properly somehow, and the string gets passed through correctly.
Many thanks, I've got something to examine!
I'll see if I can fix the search sample when I work out what it is.
regards Jeremy
Re: form encoding problems
Posted by Konstantin Piroumian <kp...@apache.org>.
Hi!
Take a look at i18n samples, particularly the XSP page
(/samples/i18n/simple.xsp) and try to enter something like that in the input
box there, then submit and see the 'Hello Tomcat' paragraph ('Tomcat' should
be replaced by the entered string). If this works correctly then take a look
at the source code of simple.xsp - maybe that's what you are looking for.
I've checked i18n samples in IE 5+ and everything were fine: even Chinese,
Japanese hieroglyphs were copied/pasted/submitted/displayed correctly.
--
Konstantin
----- Original Message -----
From: "Jeremy Quinn" <je...@media.demon.co.uk>
To: <co...@xml.apache.org>
Sent: Monday, 3 March 2003 ?. 21:05
Subject: form encoding problems
Hi All,
This is possibly a trivial mistake ... but I never came across it
before.
I have a search form for searching Lucene. Mozilla confirms the page is
in UTF-8 encoding.
I enter a string with accented characters into the query field. eg
'éclair' (e-acute).
The form comes back with the string now reading 'éclair'. (A-tilde,
Copyright sign). Mozilla says the encoding is still UTF-8. (The value
has been picked up by an InputModule and fed via the SiteMap to XSLT).
The query string in the URL reads 'query=%C3%A9clair', which are the
unicodes for 'A-tilde' and 'Copyright' characters. (Which would imply
to me that the Browser incorrectly encoded the query.)
This makes me feel like I have done something really dumb, but I cannot
work out what ;)
Incidentally, the search form in the Cocoon Samples does exactly the
same thing!!
Any suggestions?
regards Jeremy
Re: form encoding problems
Posted by Bruno Dumon <br...@outerthought.org>.
On Wed, 2003-03-12 at 17:52, Jeremy Quinn wrote:
[...]
> As far as I can tell, yes it did solve it.
> I was making only one change at a time, after this one, it worked ;)
>
> > I'm wondering how the SetCharacterEncodingAction could actually do
> > anything useful. According to the servlet spec (I'm looking at version
> > 2.3), the request.setCharacterEncoding method only does something if
> > called before any data is read from the request.
> >
>
> Interesting
>
> > Since Cocoon itself reads parameters from the request (such as
> > cocoon-reload) before any action is executed, this action obviously
> > cannot do anything useful?
> >
>
> Hmmm
>
I just found out that actually it can work -- the setCharacterEncoding
method in Cocoon's request object doesn't correspond to the servlet
spec's setCharacterEncoding method but causes some Cocoon-specific
decoding/encoding trick to happen.
> > Wouldn't it be better if we logged a big warning in this action
> > pointing
> > to the container/form-encoding parameters in the web.xml (and the same
> > in its javadoc)?
> >
>
> Yes, this is a better technique.
> I had an idea there may be a configuration here, but could not find an
> example of it.
>
Since I now found out about the above, this action actually does
something useful, though I'm not sure if it's good to promote this if
we'd ever like to migrate to the servlet spec's setCharacterEncoding
method.
[...]
> > * in the web.xml: set container-encoding to ISO-8859-1 (don't know why
> > its configurable because it should be ISO-8859-1 per spec), and set
> > form-encoding to the same encoding of the serializer.
>
> Lets put the config in, it will make it easier for other to see what to
> change. We work exclusively in UTF-8 for instance.
+1
Since the serializers are using UTF-8 by default, it's only logical that
the decoding also uses UTF-8 by default.
--
Bruno Dumon http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org bruno@apache.org
Re: form encoding problems
Posted by Jeremy Quinn <je...@media.demon.co.uk>.
On Tuesday, March 11, 2003, at 01:34 PM, Bruno Dumon wrote:
> On Mon, 2003-03-03 at 23:00, Jeremy Quinn wrote:
> [...]
>> I have got it. This was answered on the users list a while back, sorry
>> guys.
>>
>> Answer, use the SetCharacterEncodingAction in the Pipeline.
>>
>> Works with InputModules too.
>
> (a bit late to jump into this thread)
>
> Are you sure that it was adding this action that solved your problem?
>
As far as I can tell, yes it did solve it.
I was making only one change at a time, after this one, it worked ;)
> I'm wondering how the SetCharacterEncodingAction could actually do
> anything useful. According to the servlet spec (I'm looking at version
> 2.3), the request.setCharacterEncoding method only does something if
> called before any data is read from the request.
>
Interesting
> Since Cocoon itself reads parameters from the request (such as
> cocoon-reload) before any action is executed, this action obviously
> cannot do anything useful?
>
Hmmm
> Wouldn't it be better if we logged a big warning in this action
> pointing
> to the container/form-encoding parameters in the web.xml (and the same
> in its javadoc)?
>
Yes, this is a better technique.
I had an idea there may be a configuration here, but could not find an
example of it.
> As for the correct way to do things, this is what I understand from it:
>
> * set the encoding of the HTML serializer to the encoding you'd like
> to
> use
>
Yep
> * make sure a <head> element exists in the html you generate, so that
> the serializer can add a <meta ... tag into it (from Stefano's
> experiments, this does not seem to work with the xhtml serializer)
>
Good to know
> * in the web.xml: set container-encoding to ISO-8859-1 (don't know why
> its configurable because it should be ISO-8859-1 per spec), and set
> form-encoding to the same encoding of the serializer.
Lets put the config in, it will make it easier for other to see what to
change. We work exclusively in UTF-8 for instance.
I find it very strange that ISO-8859-1 should be the standard (how
parochially European ;) surely in this day and age it should be UTF-8.
;)
BTW. While I was searching Cocoon's codebase looking for code that sets
up encodings, I found a FIXME in SQLTransformer, that makes the
un-configurable assumption that ISO-8859-1 is the encoding of your DB.
I have been meaning to find time to make this configurable.
thanks for the feedback
regards Jeremy
Re: form encoding problems
Posted by Bruno Dumon <br...@outerthought.org>.
On Mon, 2003-03-03 at 23:00, Jeremy Quinn wrote:
[...]
> I have got it. This was answered on the users list a while back, sorry
> guys.
>
> Answer, use the SetCharacterEncodingAction in the Pipeline.
>
> Works with InputModules too.
(a bit late to jump into this thread)
Are you sure that it was adding this action that solved your problem?
I'm wondering how the SetCharacterEncodingAction could actually do
anything useful. According to the servlet spec (I'm looking at version
2.3), the request.setCharacterEncoding method only does something if
called before any data is read from the request.
Since Cocoon itself reads parameters from the request (such as
cocoon-reload) before any action is executed, this action obviously
cannot do anything useful?
Wouldn't it be better if we logged a big warning in this action pointing
to the container/form-encoding parameters in the web.xml (and the same
in its javadoc)?
As for the correct way to do things, this is what I understand from it:
* set the encoding of the HTML serializer to the encoding you'd like to
use
* make sure a <head> element exists in the html you generate, so that
the serializer can add a <meta ... tag into it (from Stefano's
experiments, this does not seem to work with the xhtml serializer)
* in the web.xml: set container-encoding to ISO-8859-1 (don't know why
its configurable because it should be ISO-8859-1 per spec), and set
form-encoding to the same encoding of the serializer.
--
Bruno Dumon http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org bruno@apache.org
Re: form encoding problems
Posted by Jeremy Quinn <je...@media.demon.co.uk>.
On Monday, March 3, 2003, at 09:39 PM, Torsten Curdt wrote:
>>> I recall a similiar problem a long time ago. I solved it by changing
>>> some of the HTML serializer settings. The form encoding was fine in
>>> IE
>>> but crap in NS/Mozilla. Sorry, cannot really remember what is was :-/
>>
>> Have a beer, maybe you will remember ;)
>
> *hick* ...I know now ;-)
>
> ...well ok - I looked it up. We have set the serializer encoding to
> ISO-8859-1 and it worked. At least for german :)
I have got it. This was answered on the users list a while back, sorry
guys.
Answer, use the SetCharacterEncodingAction in the Pipeline.
Works with InputModules too.
Thanks for your help
regards Jeremy
Re: form encoding problems
Posted by Torsten Curdt <tc...@dff.st>.
> > I recall a similiar problem a long time ago. I solved it by changing
> > some of the HTML serializer settings. The form encoding was fine in IE
> > but crap in NS/Mozilla. Sorry, cannot really remember what is was :-/
>
> Have a beer, maybe you will remember ;)
*hick* ...I know now ;-)
...well ok - I looked it up. We have set the serializer encoding to
ISO-8859-1 and it worked. At least for german :)
cheers
--
Torsten
Re: form encoding problems
Posted by Jeremy Quinn <je...@media.demon.co.uk>.
On Monday, March 3, 2003, at 06:16 PM, Torsten Curdt wrote:
>> Hi All,
>> This is possibly a trivial mistake ... but I never came across it
>> before.
>
> I recall a similiar problem a long time ago. I solved it by changing
> some of the HTML serializer settings. The form encoding was fine in IE
> but crap in NS/Mozilla. Sorry, cannot really remember what is was :-/
Have a beer, maybe you will remember ;)
>
>> I have a search form for searching Lucene. Mozilla confirms the page
>> is in UTF-8 encoding.
>> I enter a string with accented characters into the query field. eg
>> 'clair' (e-acute).
>> The form comes back with the string now reading 'clair'. (A-tilde,
>> Copyright sign). Mozilla says the encoding is still UTF-8. (The value
>> has been picked up by an InputModule and fed via the SiteMap to >> XSLT).
>> The query string in the URL reads 'query=%C3%A9clair', which are the
>> unicodes for 'A-tilde' and 'Copyright' characters. (Which would imply
>> to me that the Browser incorrectly encoded the query.)
>
> How do other browsers behave?
Same way, I just quoted Mozilla because it has a handy 'page info'
dialog.
>
>> This makes me feel like I have done something really dumb, but I
>> cannot work out what ;)
>
> Well, good luck :)
>
>> Any suggestions?
>
> Have a beer and then come back - maybe you'll see it then ;)
Cheers! Hic! %}
regards Jeremy
Re: form encoding problems
Posted by Torsten Curdt <tc...@dff.st>.
> Hi All,
>
> This is possibly a trivial mistake ... but I never came across it before.
I recall a similiar problem a long time ago. I solved it by changing
some of the HTML serializer settings. The form encoding was fine in IE
but crap in NS/Mozilla. Sorry, cannot really remember what is was :-/
> I have a search form for searching Lucene. Mozilla confirms the page is
> in UTF-8 encoding.
>
> I enter a string with accented characters into the query field. eg
> 'éclair' (e-acute).
>
> The form comes back with the string now reading 'éclair'. (A-tilde,
> Copyright sign). Mozilla says the encoding is still UTF-8. (The value
> has been picked up by an InputModule and fed via the SiteMap to XSLT).
>
> The query string in the URL reads 'query=%C3%A9clair', which are the
> unicodes for 'A-tilde' and 'Copyright' characters. (Which would imply to
> me that the Browser incorrectly encoded the query.)
How do other browsers behave?
> This makes me feel like I have done something really dumb, but I cannot
> work out what ;)
Well, good luck :)
> Any suggestions?
Have a beer and then come back - maybe you'll see it then ;)
cheers
--
Torsten
Re: form encoding problems
Posted by Jeremy Quinn <je...@media.demon.co.uk>.
On Monday, March 3, 2003, at 10:42 PM, Artur Bialecki wrote:
> You might want to set the following init-params for cocon servlet
> in your web.xml
>
> form-encoding to UTF-8
> container-encoding to ISO8859-1
I thought I ought to be able to do this, but did not work out how.
regards Jeremy
RE: form encoding problems
Posted by Artur Bialecki <ar...@digitalfairway.com>.
You might want to set the following init-params for cocon servlet
in your web.xml
form-encoding to UTF-8
container-encoding to ISO8859-1
Artur...
> -----Original Message-----
> From: Jeremy Quinn [mailto:jeremy@media.demon.co.uk]
> Sent: March 3, 2003 1:06 PM
> To: cocoon-dev@xml.apache.org
> Subject: form encoding problems
>
>
> Hi All,
>
> This is possibly a trivial mistake ... but I never came across it
> before.
>
> I have a search form for searching Lucene. Mozilla confirms
> the page is
> in UTF-8 encoding.
>
> I enter a string with accented characters into the query field. eg
> 'éclair' (e-acute).
>
> The form comes back with the string now reading 'éclair'. (A-tilde,
> Copyright sign). Mozilla says the encoding is still UTF-8. (The value
> has been picked up by an InputModule and fed via the SiteMap to XSLT).
>
> The query string in the URL reads 'query=%C3%A9clair', which are the
> unicodes for 'A-tilde' and 'Copyright' characters. (Which would imply
> to me that the Browser incorrectly encoded the query.)
>
> This makes me feel like I have done something really dumb,
> but I cannot
> work out what ;)
>
> Incidentally, the search form in the Cocoon Samples does exactly the
> same thing!!
>
> Any suggestions?
>
> regards Jeremy
>
>
Re: form encoding problems
Posted by Jeremy Quinn <je...@media.demon.co.uk>.
On Monday, March 3, 2003, at 06:16 PM, Leo Sutic wrote:
> Are you using Tomcat?
Yeah, sorry I should have mentioned that.
> Tomcat 3.x (I think) has a problem with UTF-8 decoding of parameters.
TomCat 4.1.18 + Apache2 mod_proxy
>
> I solved it by putting in a <meta http-equiv="encoding" value="ASCII"/>
> tag.
Yek! ;)
How does that mesh with:
<META http-equiv="Content-Type" content="text/html; charset=UTF-8"> ?
> I have no idea if Jetty is affected.
Not tried it either. But earlier I reported encoding problems in
Cocoon's new test page from Jetty.
Hmmm.
Thanks for your reply.
regards Jeremy
RE: form encoding problems
Posted by Leo Sutic <le...@inspireinfrastructure.com>.
Are you using Tomcat?
Tomcat 3.x (I think) has a problem with UTF-8 decoding of parameters.
I solved it by putting in a <meta http-equiv="encoding" value="ASCII"/>
tag.
I have no idea if Jetty is affected.
/LS
> From: Jeremy Quinn [mailto:jeremy@media.demon.co.uk]
> Any suggestions?