You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Marc Portier <mp...@outerthought.org> on 2003/10/31 14:15:03 UTC
[heads up] cocoon's defaults form-encoding and seerialize-encoding
are inconsistent.
Hi all,
we seem to have a smaal inconsistency concerning encoding of HTML forms
- our HTML serializer by default is using the UTF-8 encoding.
(in fact it's set nowhere in the system and is thus left over to xalan
which most likely is going down the easy path of assuming the default
from XML land?)
- not setting the form-encoding parameter in cocoon's web.xml defaults
to assuming the browsers are sending the request params in the
ISO-8859-1 encoding (CocoonServlet.java line 500)
Suggested fix:
I'ld like to get rid of any possible mismatch between both defaults and
would like to propose to let the AbstractTextSerializer default to
whatever the form-encoding is reading.
(still have to look how the configure() could have access to that info)
What do people think?
Related discussions
* While at it, shouldn't we kinda default to UTF-8 anyway? even if that
is not the default encoding of the servlet-container? (some gutfeeling
argument: I think cocoon is closer to XML then to servlet-containers?)
* Why is the container-encoding also an init-param? isn't that fixed by
the servlet 2.3 spec?
regards,
-marc=
--
Marc Portier http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at http://radio.weblogs.com/0116284/
mpo@outerthought.org mpo@apache.org
Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding
are inconsistent.
Posted by Marc Portier <mp...@outerthought.org>.
Sylvain Wallez wrote:
> Marc Portier wrote:
>
>> Hi all,
>>
>> we seem to have a smaal inconsistency concerning encoding of HTML forms
>>
>> - our HTML serializer by default is using the UTF-8 encoding.
>> (in fact it's set nowhere in the system and is thus left over to xalan
>> which most likely is going down the easy path of assuming the default
>> from XML land?)
>>
>> - not setting the form-encoding parameter in cocoon's web.xml defaults
>> to assuming the browsers are sending the request params in the
>> ISO-8859-1 encoding (CocoonServlet.java line 500)
>
>
>
> I encountered this problem and discovered that browsers (at least IE6 &
> Mozilla) send form content using the encoding of the HTML page. But the
> problem is that no header tells the server about the used encoding.
>
indeed, this is a known issue, see for instance the servlet 2.3 spec
section SRV 4.9 Request Data Encoding
cocoon has inside even a mechanism to survive the issue on 2.2 instalations
> What is the supposed way of writing portable applications that
> automagically find the correct encoding?
>
the supposed way is that you consider that the URI contract
communication is not only about the uri and the allowed
request-parameters but also the expected way those request params are
encoded!
so you expect the end-users of your application to be setting the
encoding in their browser according to that contract :-)
in practice this means that
1/ the one generating the html form makes sure he applies that very
encoding on the way out
2/ we all expect that the browser will do a correct auto-detection and
the end-user doesn't (know about how to) change that encoding manually
before submitting the form
the awkward thing is that the HTTP spec has room for letting the browser
communicate what was used as encoding (and the servlet 2.3
implementation should take that into account) BUT NONE OF THE BROWSERS
DO IT.
sigh, it is the same kind of historic 'wrong' as
- wrong implementations of 302 relocates (http 1.1 introduced 307 to
allow room for the correct implementation of what http 1.0 intended 302
to be)
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html (see note inside
10.3.3)
- the wrong spelling of referrer in 'http_referer' (should have been two
r's )
http://www.google.com/search?q=http_referer+spelling&sourceid=mozilla-search&start=0&start=0&ie=utf-8&oe=utf-8
so, welcome to the web:
we create specs so fast that we can't be bothered with the spelling! (or
the correct implementation)
Wobbly me doesn't mind that much about the folkloristic spelling part ;-)
-marc=
--
Marc Portier http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at http://radio.weblogs.com/0116284/
mpo@outerthought.org mpo@apache.org
Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding
are inconsistent.
Posted by Sylvain Wallez <sy...@apache.org>.
Marc Portier wrote:
> Hi all,
>
> we seem to have a smaal inconsistency concerning encoding of HTML forms
>
> - our HTML serializer by default is using the UTF-8 encoding.
> (in fact it's set nowhere in the system and is thus left over to xalan
> which most likely is going down the easy path of assuming the default
> from XML land?)
>
> - not setting the form-encoding parameter in cocoon's web.xml defaults
> to assuming the browsers are sending the request params in the
> ISO-8859-1 encoding (CocoonServlet.java line 500)
I encountered this problem and discovered that browsers (at least IE6 &
Mozilla) send form content using the encoding of the HTML page. But the
problem is that no header tells the server about the used encoding.
What is the supposed way of writing portable applications that
automagically find the correct encoding?
Sylvain
--
Sylvain Wallez Anyware Technologies
http://www.apache.org/~sylvain http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance - http://www.orixo.com
Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding
are inconsistent.
Posted by Marc Portier <mp...@outerthought.org>.
Joerg Heinicke wrote:
> On 03.11.2003 11:01, Reinhard Poetz wrote:
>
>> Yes, thank you Marc!
>>
>> I would prefer iso-8859-1 but this is just a feeling and no opinion
>> based on facts ;-)
>
>
> Even if it's only one parameter to change I would like to support
> non-ISO-characters by default and so prefer UTF-8.
>
Joerg,
I had the same original reflex, but have to say I tend to lean towards
the ISO-8859-1 approach ATM.
The point is that the XML-out would still cater for non-ISO-characters
by having a serializer that introduces character-entities like €
for the eurosign and the like...
-marc=
--
Marc Portier http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at http://radio.weblogs.com/0116284/
mpo@outerthought.org mpo@apache.org
Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding
are inconsistent.
Posted by Joerg Heinicke <jh...@virbus.de>.
On 03.11.2003 11:01, Reinhard Poetz wrote:
> Yes, thank you Marc!
>
> I would prefer iso-8859-1 but this is just a feeling and no opinion
> based on facts ;-)
Even if it's only one parameter to change I would like to support
non-ISO-characters by default and so prefer UTF-8.
Joerg
Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding
are inconsistent.
Posted by Torsten Curdt <tc...@vafer.org>.
> Yes, thank you Marc!
>
> I would prefer iso-8859-1 but this is just a feeling and no opinion
> based on facts ;-)
me, too
IIRC correctly at dff we had some encoding issues in the past
...all I remember was we switched to to iso-8859-1 and they
were gone.
...but they might have been caused by the exact mismatch
you are talking about
--
Torsten
RE: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.
Posted by Reinhard Poetz <re...@apache.org>.
From: Marc Portier
<snip/>
> So as a recap:
>
> Given the fact that todays browser behaviour is coupling
> 1. the encoding of the HTML-stream (from server to browser)
> describing
> the <form>
> to
> 2. the encoding used to encode the request params in the HTTP-request
> hosting the form-submit (from browser to server),
>
> the web-app-developer is kind of forced into doing a decent effort in
> making sure on the server-side he is decoding the request-params with
> the same encoding as was used to serialize the HTML with.
>
> The above observation made me label our current default-settings for
> both encodings inside Cocoon to be 'inconsistent':
> - if you don't specify an encoding for the serializer (sitemap.xmap)
> it's utf-8
> - if you don't specify an encoding for the form-decoding
> (web.xml) then
> it is iso-8859-1
>
>
> To fix this I'ld like to:
> use the context as described above to communicate the chosen (or
> implicit) form-decoding to the AbstractTextSerializer so it
> can use that
> as a natural default-encoding (currently there is no such thing as a
> default-encoding for the AbstractTextSerializer resulting in it being
> chosen by xalan)
>
> as a consequence however this would mean that the
> default-encoding for
> the serializers changes from utf-8 to iso-8859-1
>
> we could take the other path and let the fix go together with
> changing
> the form-decoding to utf-8
>
>
> The remaining question being: Which path do people prefer? Are there
> clear argumentations to rule out one or the other? do we vote?
>
> -marc=
> PS: I do hope this clears out the confusion?
Yes, thank you Marc!
I would prefer iso-8859-1 but this is just a feeling and no opinion
based on facts ;-)
--
Reinhard
Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding
are inconsistent.
Posted by Marc Portier <mp...@outerthought.org>.
Reinhard Poetz wrote:
>
> The parameter CONTEXT_DEFAULT_ENCODING is set in Constants.java - how
> can I override this value?
>
you don't:
it's value IS NOT the encoding, it's value is just the lookup-key inside
the context to read the DEFAULT_ENCODING
as for the remaining question 'where do I set the value then?'
there currently is a servlet init-param one can set inside the web.xml
which is called 'form-encoding'
the whole reasoning build up in this thread has been to
1/ use that same setting as the default for our
text-oriented-serializers (ie anything below AbstractTextSerializer in
the inheritance chain) in order to avoid as much as possible the
possible inconsistency we are facing now.
2/ implement this by adding that setting to the Context and letting the
AbstractTextSerializer be Contextualizable
>
>>
>>personally I think this patch should come together with a
>>change to our
>>web.xml so we rather change the default form-encoding to be
>>also "utf-8"
>
>
> sorry, I don't understand this. Does this mean the general encoding is
> iso-8859-1 and the form encoding is UTF-8? If yes, why two different
> encodings?
>
by now Joerg and Bruno have been adding enough to the thread to see that
there is more then just two encodings in this world, and quite
interestingly: they can all be different :-)
I understand that this can become easily confusing, and that is the main
reason I didn't want to expand the discussion to any other encodings
then the ones at hand here.
So as a recap:
Given the fact that todays browser behaviour is coupling
1. the encoding of the HTML-stream (from server to browser) describing
the <form>
to
2. the encoding used to encode the request params in the HTTP-request
hosting the form-submit (from browser to server),
the web-app-developer is kind of forced into doing a decent effort in
making sure on the server-side he is decoding the request-params with
the same encoding as was used to serialize the HTML with.
The above observation made me label our current default-settings for
both encodings inside Cocoon to be 'inconsistent':
- if you don't specify an encoding for the serializer (sitemap.xmap)
it's utf-8
- if you don't specify an encoding for the form-decoding (web.xml) then
it is iso-8859-1
To fix this I'ld like to:
use the context as described above to communicate the chosen (or
implicit) form-decoding to the AbstractTextSerializer so it can use that
as a natural default-encoding (currently there is no such thing as a
default-encoding for the AbstractTextSerializer resulting in it being
chosen by xalan)
as a consequence however this would mean that the default-encoding for
the serializers changes from utf-8 to iso-8859-1
we could take the other path and let the fix go together with changing
the form-decoding to utf-8
The remaining question being: Which path do people prefer? Are there
clear argumentations to rule out one or the other? do we vote?
-marc=
PS: I do hope this clears out the confusion?
--
Marc Portier http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at http://radio.weblogs.com/0116284/
mpo@outerthought.org mpo@apache.org
Re: [heads up] cocoon's defaults form-encoding
and seerialize-encoding are inconsistent.
Posted by Bruno Dumon <br...@outerthought.org>.
On Sat, 2003-11-01 at 12:58, Joerg Heinicke wrote:
> Now I'm confused ...
>
> With the container encoding all resources are read, i.e. my text files
> and the request.
Nope, these are two different encodings:
* text files are read according to whatever encoding/locale is
configured in your OS (unless you supply special parameters when
starting the JVM)
* request parameters are always decoded using ISO-8859-1
See also section 4.9 in the servlet 2.3 spec:
-- begin quote
Currently, many browsers do not send a char encoding qualifier with the
Content- Type header, leaving open the determination of the character
encoding for reading HTTP requests. The default encoding of a request
the container uses to create the request reader and parse POST data must
be ISO-8859-1 , if none has been specified by the client request.
However, in order to indicate to the developer in this case the failure
of the client to send a character encoding, the container returns null
from the getCharacterEncoding method. If the client hasn t set character
encoding and the request data is encoded with a different encoding than
the default as described above, breakage can occur. To remedy this
situation, a new method setCharacterEncoding(String enc) has been added
to the ServletRequest interface. Developers can override the character
encoding supplied by the container by calling this method. It must be
called prior to parsing any post data or reading any input from the
request. Calling this method once data has been read will not affect the
encoding.
-- end quote
Since the mentioned setCharacterEncoding isn't supported since long (and
must be called before any request parameter is read), Cocoon has its own
mechanism to fix this, which does something like:
new String(value.getBytes(container_encoding), form_encoding);
container_encoding should always be ISO-8859-1 (unless you have a broken
servlet container), and form_encoding should be the same one as on your
serializer.
> The form encoding only recodes the request parameters
> to the expected (i.e. container) encoding. So it works like a servlet
> filter.
>
> Joerg
>
> On 01.11.2003 12:36, Bruno Dumon wrote:
>
> > On Sat, 2003-11-01 at 12:24, Joerg Heinicke wrote:
> >
> >>On 01.11.2003 12:08, Reinhard Poetz wrote:
> >>
> >>
> >>>>personally I think this patch should come together with a
> >>>>change to our
> >>>>web.xml so we rather change the default form-encoding to be
> >>>>also "utf-8"
> >>>
> >>>
> >>>sorry, I don't understand this. Does this mean the general encoding is
> >>>iso-8859-1 and the form encoding is UTF-8? If yes, why two different
> >>>encodings?
> >>
> >>These are two different things.
> >>
> >>On the one hand there is the container encoding. It defines with which
> >>encoding textfiles are read, e.g. properties files. It's about servlet
> >>container <=> file system.
> >>
> >
> >
> > The "container encoding" mentioned here is the encoding with which the
> > servlet container decoded request parameters. The servlet spec says that
> > this should always be ISO-8859-1 (unless the client specified another
> > encoding or, from 2.3, request.setCharacterEncoding is used). This
> > parameter has nothing to do with the encoding used to decode e.g. text
> > files, and should normally always be left to ISO-8859-1.
> >
> > Some more info about all this can be found on this wiki page:
> > http://wiki.cocoondev.org/Wiki.jsp?page=RequestParameterEncoding
> >
> >
> >>On the other hand there is the form encoding. It defines with which
> >>encoding requests are read. It's about servlet container <=> clients.
> >>
> >>I hope it's correct so.
--
Bruno Dumon http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org bruno@apache.org
Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding
are inconsistent.
Posted by Joerg Heinicke <jh...@virbus.de>.
Now I'm confused ...
With the container encoding all resources are read, i.e. my text files
and the request. The form encoding only recodes the request parameters
to the expected (i.e. container) encoding. So it works like a servlet
filter.
Joerg
On 01.11.2003 12:36, Bruno Dumon wrote:
> On Sat, 2003-11-01 at 12:24, Joerg Heinicke wrote:
>
>>On 01.11.2003 12:08, Reinhard Poetz wrote:
>>
>>
>>>>personally I think this patch should come together with a
>>>>change to our
>>>>web.xml so we rather change the default form-encoding to be
>>>>also "utf-8"
>>>
>>>
>>>sorry, I don't understand this. Does this mean the general encoding is
>>>iso-8859-1 and the form encoding is UTF-8? If yes, why two different
>>>encodings?
>>
>>These are two different things.
>>
>>On the one hand there is the container encoding. It defines with which
>>encoding textfiles are read, e.g. properties files. It's about servlet
>>container <=> file system.
>>
>
>
> The "container encoding" mentioned here is the encoding with which the
> servlet container decoded request parameters. The servlet spec says that
> this should always be ISO-8859-1 (unless the client specified another
> encoding or, from 2.3, request.setCharacterEncoding is used). This
> parameter has nothing to do with the encoding used to decode e.g. text
> files, and should normally always be left to ISO-8859-1.
>
> Some more info about all this can be found on this wiki page:
> http://wiki.cocoondev.org/Wiki.jsp?page=RequestParameterEncoding
>
>
>>On the other hand there is the form encoding. It defines with which
>>encoding requests are read. It's about servlet container <=> clients.
>>
>>I hope it's correct so.
Re: [heads up] cocoon's defaults form-encoding and
seerialize-encoding are inconsistent.
Posted by Bruno Dumon <br...@outerthought.org>.
On Sat, 2003-11-01 at 12:24, Joerg Heinicke wrote:
> On 01.11.2003 12:08, Reinhard Poetz wrote:
>
> >>personally I think this patch should come together with a
> >>change to our
> >>web.xml so we rather change the default form-encoding to be
> >>also "utf-8"
> >
> >
> > sorry, I don't understand this. Does this mean the general encoding is
> > iso-8859-1 and the form encoding is UTF-8? If yes, why two different
> > encodings?
>
> These are two different things.
>
> On the one hand there is the container encoding. It defines with which
> encoding textfiles are read, e.g. properties files. It's about servlet
> container <=> file system.
>
The "container encoding" mentioned here is the encoding with which the
servlet container decoded request parameters. The servlet spec says that
this should always be ISO-8859-1 (unless the client specified another
encoding or, from 2.3, request.setCharacterEncoding is used). This
parameter has nothing to do with the encoding used to decode e.g. text
files, and should normally always be left to ISO-8859-1.
Some more info about all this can be found on this wiki page:
http://wiki.cocoondev.org/Wiki.jsp?page=RequestParameterEncoding
> On the other hand there is the form encoding. It defines with which
> encoding requests are read. It's about servlet container <=> clients.
>
> I hope it's correct so.
--
Bruno Dumon http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org bruno@apache.org
Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding
are inconsistent.
Posted by Joerg Heinicke <jh...@virbus.de>.
On 01.11.2003 12:08, Reinhard Poetz wrote:
>>personally I think this patch should come together with a
>>change to our
>>web.xml so we rather change the default form-encoding to be
>>also "utf-8"
>
>
> sorry, I don't understand this. Does this mean the general encoding is
> iso-8859-1 and the form encoding is UTF-8? If yes, why two different
> encodings?
These are two different things.
On the one hand there is the container encoding. It defines with which
encoding textfiles are read, e.g. properties files. It's about servlet
container <=> file system.
On the other hand there is the form encoding. It defines with which
encoding requests are read. It's about servlet container <=> clients.
I hope it's correct so.
Joerg
RE: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.
Posted by Reinhard Poetz <re...@apache.org>.
From: Marc Portier
> OK,
>
> Thx to Carsten's suggestions I have a patch for this that
> rougly looks like
>
>
>
> 1/ in src/java/org/apache/cocoon/Constants.java
> . add constant CONTEXT_DEFAULT_ENCODING
>
>
> 2/ in
> src/java/org/apache/cocoon/serialization/AbstractTextSerializer.java
>
> . add imports for Contextualizable
> . add interface to class declaration
> . use contextualize method to set default encoding to what
> is set in
> the context
> . note that the configure can still change it depending on the
> sitemap conf
>
>
> 3/ in src/java/org/apache/cocoon/servlet/CocoonServlet.java
>
> . in the init() we add the default encoding to the context as read
> from the servlet-initParameter "form-encoding"
>
>
>
> now, since the last defaults to iso-8859-1 there is a bit of a
> side-effect to this patch which I introduced in my original posting
>
>
>
> >>> * While at it, shouldn't we kinda default to UTF-8 anyway? even if
> >>> that is not the default encoding of the servlet-container? (some
> >>> gutfeeling argument: I think cocoon is closer to XML then to
> >>> servlet-containers?)
> >>>
>
> if I just apply the patch as described above the side-effect will be
> that the default-serialization for all our text-serializers (unless
> overriden by the config in the sitemap.xmap) will change from utf-8
> (more precisely: whatever xalan defaults to) to iso-8859-1
>
>
> maybe that isn't that bad, but just wanted to make you all aware.
> do we need a vote on this, or do I just as I redeem best?
The parameter CONTEXT_DEFAULT_ENCODING is set in Constants.java - how
can I override this value?
>
>
> personally I think this patch should come together with a
> change to our
> web.xml so we rather change the default form-encoding to be
> also "utf-8"
sorry, I don't understand this. Does this mean the general encoding is
iso-8859-1 and the form encoding is UTF-8? If yes, why two different
encodings?
Cheers,
Reinhard
Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding
are inconsistent.
Posted by Marc Portier <mp...@outerthought.org>.
OK,
Thx to Carsten's suggestions I have a patch for this that rougly looks like
1/ in src/java/org/apache/cocoon/Constants.java
. add constant CONTEXT_DEFAULT_ENCODING
2/ in src/java/org/apache/cocoon/serialization/AbstractTextSerializer.java
. add imports for Contextualizable
. add interface to class declaration
. use contextualize method to set default encoding to what is set in
the context
. note that the configure can still change it depending on the
sitemap conf
3/ in src/java/org/apache/cocoon/servlet/CocoonServlet.java
. in the init() we add the default encoding to the context as read
from the servlet-initParameter "form-encoding"
now, since the last defaults to iso-8859-1 there is a bit of a
side-effect to this patch which I introduced in my original posting
>>> * While at it, shouldn't we kinda default to UTF-8 anyway? even if
>>> that is not the default encoding of the servlet-container? (some
>>> gutfeeling argument: I think cocoon is closer to XML then to
>>> servlet-containers?)
>>>
if I just apply the patch as described above the side-effect will be
that the default-serialization for all our text-serializers (unless
overriden by the config in the sitemap.xmap) will change from utf-8
(more precisely: whatever xalan defaults to) to iso-8859-1
maybe that isn't that bad, but just wanted to make you all aware.
do we need a vote on this, or do I just as I redeem best?
personally I think this patch should come together with a change to our
web.xml so we rather change the default form-encoding to be also "utf-8"
other opinions?
thx for any feedback
-marc=
--
Marc Portier http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at http://radio.weblogs.com/0116284/
mpo@outerthought.org mpo@apache.org
RE: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.
Posted by Carsten Ziegeler <cz...@s-und-n.de>.
Marc Portier wrote:
>
> However, what does this say about using cocoon outside the
> servlet-context?
>
> In every case: I like the idea of using the contextualize() but maybe it
> makes more sense if we don't introduce a dependency between
> AbstractTextSerializer and servlet.jar?
>
Ok, that's true :(
> So maybe the Context directly should be augmented with access to this
> kind of 'global' info?
>
Yes, why not. Perhaps adding all parameters (either from web.xml or
cli) to the context?
Carsten
Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding
are inconsistent.
Posted by Marc Portier <mp...@outerthought.org>.
Carsten Ziegeler wrote:
> FYI, the Context (you get via Contextualizable) contains the ServletConfig
> via a constant defined in the CocoonServlet.
> This is something your two collegues were wondering about, but it might
> be that it helps you :)
>
Yep, sounds like a way out of the stalemate I was facing..
However, what does this say about using cocoon outside the servlet-context?
In every case: I like the idea of using the contextualize() but maybe it
makes more sense if we don't introduce a dependency between
AbstractTextSerializer and servlet.jar?
So maybe the Context directly should be augmented with access to this
kind of 'global' info?
-marc=
> Carsten
>
>
>>-----Original Message-----
>>From: Marc Portier [mailto:mpo@outerthought.org]
>>Sent: Friday, October 31, 2003 2:15 PM
>>To: dev@cocoon.apache.org
>>Subject: [heads up] cocoon's defaults form-encoding and
>>seerialize-encoding are inconsistent.
>>
>>
>>Hi all,
>>
>>we seem to have a smaal inconsistency concerning encoding of HTML forms
>>
>>- our HTML serializer by default is using the UTF-8 encoding.
>>(in fact it's set nowhere in the system and is thus left over to xalan
>>which most likely is going down the easy path of assuming the default
>>from XML land?)
>>
>>- not setting the form-encoding parameter in cocoon's web.xml defaults
>>to assuming the browsers are sending the request params in the
>>ISO-8859-1 encoding (CocoonServlet.java line 500)
>>
>>
>>Suggested fix:
>>I'ld like to get rid of any possible mismatch between both defaults and
>>would like to propose to let the AbstractTextSerializer default to
>>whatever the form-encoding is reading.
>>(still have to look how the configure() could have access to that info)
>>
>>
>>What do people think?
>>
>>
>>
>>Related discussions
>>
>>* While at it, shouldn't we kinda default to UTF-8 anyway? even if that
>>is not the default encoding of the servlet-container? (some gutfeeling
>>argument: I think cocoon is closer to XML then to servlet-containers?)
>>
>>* Why is the container-encoding also an init-param? isn't that fixed by
>>the servlet 2.3 spec?
>>
>>
>>regards,
>>-marc=
>>--
>>Marc Portier http://outerthought.org/
>>Outerthought - Open Source, Java & XML Competence Support Center
>>Read my weblog at http://radio.weblogs.com/0116284/
>>mpo@outerthought.org mpo@apache.org
>>
>>
>
>
--
Marc Portier http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at http://radio.weblogs.com/0116284/
mpo@outerthought.org mpo@apache.org
RE: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.
Posted by Carsten Ziegeler <cz...@s-und-n.de>.
FYI, the Context (you get via Contextualizable) contains the ServletConfig
via a constant defined in the CocoonServlet.
This is something your two collegues were wondering about, but it might
be that it helps you :)
Carsten
> -----Original Message-----
> From: Marc Portier [mailto:mpo@outerthought.org]
> Sent: Friday, October 31, 2003 2:15 PM
> To: dev@cocoon.apache.org
> Subject: [heads up] cocoon's defaults form-encoding and
> seerialize-encoding are inconsistent.
>
>
> Hi all,
>
> we seem to have a smaal inconsistency concerning encoding of HTML forms
>
> - our HTML serializer by default is using the UTF-8 encoding.
> (in fact it's set nowhere in the system and is thus left over to xalan
> which most likely is going down the easy path of assuming the default
> from XML land?)
>
> - not setting the form-encoding parameter in cocoon's web.xml defaults
> to assuming the browsers are sending the request params in the
> ISO-8859-1 encoding (CocoonServlet.java line 500)
>
>
> Suggested fix:
> I'ld like to get rid of any possible mismatch between both defaults and
> would like to propose to let the AbstractTextSerializer default to
> whatever the form-encoding is reading.
> (still have to look how the configure() could have access to that info)
>
>
> What do people think?
>
>
>
> Related discussions
>
> * While at it, shouldn't we kinda default to UTF-8 anyway? even if that
> is not the default encoding of the servlet-container? (some gutfeeling
> argument: I think cocoon is closer to XML then to servlet-containers?)
>
> * Why is the container-encoding also an init-param? isn't that fixed by
> the servlet 2.3 spec?
>
>
> regards,
> -marc=
> --
> Marc Portier http://outerthought.org/
> Outerthought - Open Source, Java & XML Competence Support Center
> Read my weblog at http://radio.weblogs.com/0116284/
> mpo@outerthought.org mpo@apache.org
>
>