You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Marc Portier <mp...@outerthought.org> on 2003/10/31 14:15:03 UTC

[heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.

Hi all,

we seem to have a smaal inconsistency concerning encoding of HTML forms

- our HTML serializer by default is using the UTF-8 encoding.
(in fact it's set nowhere in the system and is thus left over to xalan 
which most likely is going down the easy path of assuming the default 
from XML land?)

- not setting the form-encoding parameter in cocoon's web.xml defaults 
to assuming the browsers are sending the request params in the 
ISO-8859-1 encoding (CocoonServlet.java line 500)


Suggested fix:
I'ld like to get rid of any possible mismatch between both defaults and 
would like to propose to let the AbstractTextSerializer default to 
whatever the form-encoding is reading.
(still have to look how the configure() could have access to that info)


What do people think?



Related discussions

* While at it, shouldn't we kinda default to UTF-8 anyway? even if that 
is not the default encoding of the servlet-container? (some gutfeeling 
argument: I think cocoon is closer to XML then to servlet-containers?)

* Why is the container-encoding also an init-param? isn't that fixed by 
the servlet 2.3 spec?


regards,
-marc=
-- 
Marc Portier                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0116284/
mpo@outerthought.org                              mpo@apache.org



Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.

Posted by Marc Portier <mp...@outerthought.org>.

Sylvain Wallez wrote:

> Marc Portier wrote:
> 
>> Hi all,
>>
>> we seem to have a smaal inconsistency concerning encoding of HTML forms
>>
>> - our HTML serializer by default is using the UTF-8 encoding.
>> (in fact it's set nowhere in the system and is thus left over to xalan 
>> which most likely is going down the easy path of assuming the default 
>> from XML land?)
>>
>> - not setting the form-encoding parameter in cocoon's web.xml defaults 
>> to assuming the browsers are sending the request params in the 
>> ISO-8859-1 encoding (CocoonServlet.java line 500)
> 
> 
> 
> I encountered this problem and discovered that browsers (at least IE6 & 
> Mozilla) send form content using the encoding of the HTML page. But the 
> problem is that no header tells the server about the used encoding.
> 

indeed, this is a known issue, see for instance the servlet 2.3 spec
section SRV 4.9 Request Data Encoding

cocoon has inside even a mechanism to survive the issue on 2.2 instalations

> What is the supposed way of writing portable applications that 
> automagically find the correct encoding?
> 

the supposed way is that you consider that the URI contract 
communication is not only about the uri and the allowed 
request-parameters but also the expected way those request params are 
encoded!

so you expect the end-users of your application to be setting the 
encoding in their browser according to that contract :-)

in practice this means that
1/ the one generating the html form makes sure he applies that very 
encoding on the way out
2/ we all expect that the browser will do a correct auto-detection and 
the end-user doesn't (know about how to) change that encoding manually 
before submitting the form

the awkward thing is that the HTTP spec has room for letting the browser 
communicate what was used as encoding (and the servlet 2.3 
implementation should take that into account) BUT NONE OF THE BROWSERS 
DO IT.



sigh, it is the same kind of historic 'wrong' as

- wrong implementations of 302 relocates (http 1.1 introduced 307 to 
allow room for the correct implementation of what http 1.0 intended 302 
to be)
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html (see note inside 
10.3.3)

- the wrong spelling of referrer in 'http_referer' (should have been two 
r's )
http://www.google.com/search?q=http_referer+spelling&sourceid=mozilla-search&start=0&start=0&ie=utf-8&oe=utf-8




so, welcome to the web:
we create specs so fast that we can't be bothered with the spelling! (or 
the correct implementation)



Wobbly me doesn't mind that much about the folkloristic spelling part ;-)

-marc=
-- 
Marc Portier                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0116284/
mpo@outerthought.org                              mpo@apache.org


Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.

Posted by Sylvain Wallez <sy...@apache.org>.
Marc Portier wrote:

> Hi all,
>
> we seem to have a smaal inconsistency concerning encoding of HTML forms
>
> - our HTML serializer by default is using the UTF-8 encoding.
> (in fact it's set nowhere in the system and is thus left over to xalan 
> which most likely is going down the easy path of assuming the default 
> from XML land?)
>
> - not setting the form-encoding parameter in cocoon's web.xml defaults 
> to assuming the browsers are sending the request params in the 
> ISO-8859-1 encoding (CocoonServlet.java line 500)


I encountered this problem and discovered that browsers (at least IE6 & 
Mozilla) send form content using the encoding of the HTML page. But the 
problem is that no header tells the server about the used encoding.

What is the supposed way of writing portable applications that 
automagically find the correct encoding?

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance  -  http://www.orixo.com



Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.

Posted by Marc Portier <mp...@outerthought.org>.

Joerg Heinicke wrote:
> On 03.11.2003 11:01, Reinhard Poetz wrote:
> 
>> Yes, thank you Marc!
>>
>> I would prefer iso-8859-1 but this is just a feeling and no opinion
>> based on facts ;-)
> 
> 
> Even if it's only one parameter to change I would like to support 
> non-ISO-characters by default and so prefer UTF-8.
> 

Joerg,

I had the same original reflex, but have to say I tend to lean towards 
the ISO-8859-1 approach ATM.

The point is that the XML-out would still cater for non-ISO-characters 
by having a serializer that introduces character-entities like &#8364; 
for the eurosign and the like...

-marc=
-- 
Marc Portier                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0116284/
mpo@outerthought.org                              mpo@apache.org


Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.

Posted by Joerg Heinicke <jh...@virbus.de>.
On 03.11.2003 11:01, Reinhard Poetz wrote:

> Yes, thank you Marc!
> 
> I would prefer iso-8859-1 but this is just a feeling and no opinion
> based on facts ;-)

Even if it's only one parameter to change I would like to support 
non-ISO-characters by default and so prefer UTF-8.

Joerg


Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.

Posted by Torsten Curdt <tc...@vafer.org>.
> Yes, thank you Marc!
> 
> I would prefer iso-8859-1 but this is just a feeling and no opinion
> based on facts ;-)

me, too

IIRC correctly at dff we had some encoding issues in the past
...all I remember was we switched to to iso-8859-1 and they
were gone.

...but they might have been caused by the exact mismatch
you are talking about
--
Torsten


RE: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.

Posted by Reinhard Poetz <re...@apache.org>.
From: Marc Portier

<snip/>

> So as a recap:
> 
> Given the fact that todays browser behaviour is coupling
> 1. the encoding of the HTML-stream (from server to browser) 
> describing 
> the <form>
> to
> 2. the encoding used to encode the request params in the HTTP-request 
> hosting the form-submit (from browser to server),
> 
> the web-app-developer is kind of forced into doing a decent effort in 
> making sure on the server-side he is decoding the request-params with 
> the same encoding as was used to serialize the HTML with.
> 
> The above observation made me label our current default-settings for 
> both encodings inside Cocoon to be 'inconsistent':
> - if you don't specify an encoding for the serializer (sitemap.xmap) 
> it's utf-8
> - if you don't specify an encoding for the form-decoding 
> (web.xml) then 
> it is iso-8859-1
> 
> 
> To fix this I'ld like to:
> use the context as described above to communicate the chosen (or 
> implicit) form-decoding to the AbstractTextSerializer so it 
> can use that 
> as a natural default-encoding (currently there is no such thing as a 
> default-encoding for the AbstractTextSerializer resulting in it being 
> chosen by xalan)
> 
> as a consequence however this would mean that the 
> default-encoding for 
> the serializers changes from utf-8 to iso-8859-1
> 
> we could take the other path and let the fix go together with 
> changing 
> the form-decoding to utf-8
> 
> 
> The remaining question being: Which path do people prefer? Are there 
> clear argumentations to rule out one or the other? do we vote?
> 
> -marc=
> PS: I do hope this clears out the confusion?

Yes, thank you Marc!

I would prefer iso-8859-1 but this is just a feeling and no opinion
based on facts ;-)

--
Reinhard


Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.

Posted by Marc Portier <mp...@outerthought.org>.
Reinhard Poetz wrote:

> 
> The parameter CONTEXT_DEFAULT_ENCODING is set in Constants.java - how
> can I override this value?
> 

you don't:
it's value IS NOT the encoding, it's value is just the lookup-key inside 
the context to read the DEFAULT_ENCODING

as for the remaining question 'where do I set the value then?'
there currently is a servlet init-param one can set inside the web.xml 
which is called 'form-encoding'

the whole reasoning build up in this thread has been to
1/ use that same setting as the default for our 
text-oriented-serializers (ie anything below AbstractTextSerializer in 
the inheritance chain) in order to avoid as much as possible the 
possible inconsistency we are facing now.

2/ implement this by adding that setting to the Context and letting the 
AbstractTextSerializer be Contextualizable

> 
>>
>>personally I think this patch should come together with a 
>>change to our 
>>web.xml so we rather change the default form-encoding to be 
>>also "utf-8"
> 
> 
> sorry, I don't understand this. Does this mean the general encoding is
> iso-8859-1 and the form encoding is UTF-8? If yes, why two different
> encodings?
> 

by now Joerg and Bruno have been adding enough to the thread to see that 
there is more then just two encodings in this world, and quite 
interestingly: they can all be different :-)

I understand that this can become easily confusing, and that is the main 
reason I didn't want to expand the discussion to any other encodings 
then the ones at hand here.




So as a recap:

Given the fact that todays browser behaviour is coupling
1. the encoding of the HTML-stream (from server to browser) describing 
the <form>
to
2. the encoding used to encode the request params in the HTTP-request 
hosting the form-submit (from browser to server),

the web-app-developer is kind of forced into doing a decent effort in 
making sure on the server-side he is decoding the request-params with 
the same encoding as was used to serialize the HTML with.

The above observation made me label our current default-settings for 
both encodings inside Cocoon to be 'inconsistent':
- if you don't specify an encoding for the serializer (sitemap.xmap) 
it's utf-8
- if you don't specify an encoding for the form-decoding (web.xml) then 
it is iso-8859-1


To fix this I'ld like to:
use the context as described above to communicate the chosen (or 
implicit) form-decoding to the AbstractTextSerializer so it can use that 
as a natural default-encoding (currently there is no such thing as a 
default-encoding for the AbstractTextSerializer resulting in it being 
chosen by xalan)

as a consequence however this would mean that the default-encoding for 
the serializers changes from utf-8 to iso-8859-1

we could take the other path and let the fix go together with changing 
the form-decoding to utf-8


The remaining question being: Which path do people prefer? Are there 
clear argumentations to rule out one or the other? do we vote?

-marc=
PS: I do hope this clears out the confusion?
-- 
Marc Portier                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0116284/
mpo@outerthought.org                              mpo@apache.org


Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.

Posted by Bruno Dumon <br...@outerthought.org>.
On Sat, 2003-11-01 at 12:58, Joerg Heinicke wrote:
> Now I'm confused ...
> 
> With the container encoding all resources are read, i.e. my text files 
> and the request.

Nope, these are two different encodings:

* text files are read according to whatever encoding/locale is
configured in your OS (unless you supply special parameters when
starting the JVM)

* request parameters are always decoded using ISO-8859-1

See also section 4.9 in the servlet 2.3 spec:

-- begin quote
Currently, many browsers do not send a char encoding qualifier with the
Content- Type header, leaving open the determination of the character
encoding for reading HTTP requests. The default encoding of a request
the container uses to create the request reader and parse POST data must
be  ISO-8859-1 , if none has been specified by the client request.
However, in order to indicate to the developer in this case the failure
of the client to send a character encoding, the container returns null
from the getCharacterEncoding method. If the client hasn t set character
encoding and the request data is encoded with a different encoding than
the default as described above, breakage can occur. To remedy this
situation, a new method setCharacterEncoding(String enc) has been added
to the ServletRequest interface. Developers can override the character
encoding supplied by the container by calling this method. It must be
called prior to parsing any post data or reading any input from the
request. Calling this method once data has been read will not affect the
encoding.
-- end quote

Since the mentioned setCharacterEncoding isn't supported since long (and
must be called before any request parameter is read), Cocoon has its own
mechanism to fix this, which does something like:

new String(value.getBytes(container_encoding), form_encoding);

container_encoding should always be ISO-8859-1 (unless you have a broken
servlet container), and form_encoding should be the same one as on your
serializer.

>  The form encoding only recodes the request parameters 
> to the expected (i.e. container) encoding. So it works like a servlet 
> filter.
> 
> Joerg
> 
> On 01.11.2003 12:36, Bruno Dumon wrote:
> 
> > On Sat, 2003-11-01 at 12:24, Joerg Heinicke wrote:
> > 
> >>On 01.11.2003 12:08, Reinhard Poetz wrote:
> >>
> >>
> >>>>personally I think this patch should come together with a 
> >>>>change to our 
> >>>>web.xml so we rather change the default form-encoding to be 
> >>>>also "utf-8"
> >>>
> >>>
> >>>sorry, I don't understand this. Does this mean the general encoding is
> >>>iso-8859-1 and the form encoding is UTF-8? If yes, why two different
> >>>encodings?
> >>
> >>These are two different things.
> >>
> >>On the one hand there is the container encoding. It defines with which 
> >>encoding textfiles are read, e.g. properties files. It's about servlet 
> >>container <=> file system.
> >>
> > 
> > 
> > The "container encoding" mentioned here is the encoding with which the
> > servlet container decoded request parameters. The servlet spec says that
> > this should always be ISO-8859-1 (unless the client specified another
> > encoding or, from 2.3, request.setCharacterEncoding is used). This
> > parameter has nothing to do with the encoding used to decode e.g. text
> > files, and should normally always be left to ISO-8859-1.
> > 
> > Some more info about all this can be found on this wiki page:
> > http://wiki.cocoondev.org/Wiki.jsp?page=RequestParameterEncoding
> > 
> > 
> >>On the other hand there is the form encoding. It defines with which 
> >>encoding requests are read. It's about servlet container <=> clients.
> >>
> >>I hope it's correct so.
-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.

Posted by Joerg Heinicke <jh...@virbus.de>.
Now I'm confused ...

With the container encoding all resources are read, i.e. my text files 
and the request. The form encoding only recodes the request parameters 
to the expected (i.e. container) encoding. So it works like a servlet 
filter.

Joerg

On 01.11.2003 12:36, Bruno Dumon wrote:

> On Sat, 2003-11-01 at 12:24, Joerg Heinicke wrote:
> 
>>On 01.11.2003 12:08, Reinhard Poetz wrote:
>>
>>
>>>>personally I think this patch should come together with a 
>>>>change to our 
>>>>web.xml so we rather change the default form-encoding to be 
>>>>also "utf-8"
>>>
>>>
>>>sorry, I don't understand this. Does this mean the general encoding is
>>>iso-8859-1 and the form encoding is UTF-8? If yes, why two different
>>>encodings?
>>
>>These are two different things.
>>
>>On the one hand there is the container encoding. It defines with which 
>>encoding textfiles are read, e.g. properties files. It's about servlet 
>>container <=> file system.
>>
> 
> 
> The "container encoding" mentioned here is the encoding with which the
> servlet container decoded request parameters. The servlet spec says that
> this should always be ISO-8859-1 (unless the client specified another
> encoding or, from 2.3, request.setCharacterEncoding is used). This
> parameter has nothing to do with the encoding used to decode e.g. text
> files, and should normally always be left to ISO-8859-1.
> 
> Some more info about all this can be found on this wiki page:
> http://wiki.cocoondev.org/Wiki.jsp?page=RequestParameterEncoding
> 
> 
>>On the other hand there is the form encoding. It defines with which 
>>encoding requests are read. It's about servlet container <=> clients.
>>
>>I hope it's correct so.


Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.

Posted by Bruno Dumon <br...@outerthought.org>.
On Sat, 2003-11-01 at 12:24, Joerg Heinicke wrote:
> On 01.11.2003 12:08, Reinhard Poetz wrote:
> 
> >>personally I think this patch should come together with a 
> >>change to our 
> >>web.xml so we rather change the default form-encoding to be 
> >>also "utf-8"
> > 
> > 
> > sorry, I don't understand this. Does this mean the general encoding is
> > iso-8859-1 and the form encoding is UTF-8? If yes, why two different
> > encodings?
> 
> These are two different things.
> 
> On the one hand there is the container encoding. It defines with which 
> encoding textfiles are read, e.g. properties files. It's about servlet 
> container <=> file system.
> 

The "container encoding" mentioned here is the encoding with which the
servlet container decoded request parameters. The servlet spec says that
this should always be ISO-8859-1 (unless the client specified another
encoding or, from 2.3, request.setCharacterEncoding is used). This
parameter has nothing to do with the encoding used to decode e.g. text
files, and should normally always be left to ISO-8859-1.

Some more info about all this can be found on this wiki page:
http://wiki.cocoondev.org/Wiki.jsp?page=RequestParameterEncoding

> On the other hand there is the form encoding. It defines with which 
> encoding requests are read. It's about servlet container <=> clients.
> 
> I hope it's correct so.
-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.

Posted by Joerg Heinicke <jh...@virbus.de>.
On 01.11.2003 12:08, Reinhard Poetz wrote:

>>personally I think this patch should come together with a 
>>change to our 
>>web.xml so we rather change the default form-encoding to be 
>>also "utf-8"
> 
> 
> sorry, I don't understand this. Does this mean the general encoding is
> iso-8859-1 and the form encoding is UTF-8? If yes, why two different
> encodings?

These are two different things.

On the one hand there is the container encoding. It defines with which 
encoding textfiles are read, e.g. properties files. It's about servlet 
container <=> file system.

On the other hand there is the form encoding. It defines with which 
encoding requests are read. It's about servlet container <=> clients.

I hope it's correct so.

Joerg


RE: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.

Posted by Reinhard Poetz <re...@apache.org>.
From: Marc Portier

> OK,
> 
> Thx to Carsten's suggestions I have a patch for this that 
> rougly looks like
> 
> 
> 
> 1/ in src/java/org/apache/cocoon/Constants.java
>    . add constant  CONTEXT_DEFAULT_ENCODING
> 
> 
> 2/ in  
> src/java/org/apache/cocoon/serialization/AbstractTextSerializer.java
> 
>    . add imports for Contextualizable
>    . add interface to class declaration
>    . use contextualize method to set default encoding to what 
> is set in 
> the context
>    . note that the configure can still change it depending on the 
> sitemap conf
> 
> 
> 3/ in src/java/org/apache/cocoon/servlet/CocoonServlet.java
> 
>    . in the init() we add the default encoding to the context as read 
> from the servlet-initParameter "form-encoding"
> 
> 
> 
> now, since the last defaults to iso-8859-1 there is a bit of a 
> side-effect to this patch which I introduced in my original posting
> 
> 
> 
> >>> * While at it, shouldn't we kinda default to UTF-8 anyway? even if
> >>> that is not the default encoding of the servlet-container? (some 
> >>> gutfeeling argument: I think cocoon is closer to XML then to 
> >>> servlet-containers?)
> >>>
> 
> if I just apply the patch as described above the side-effect will be 
> that the default-serialization for all our text-serializers (unless 
> overriden by the config in the sitemap.xmap) will change from utf-8 
> (more precisely: whatever xalan defaults to) to iso-8859-1
> 
> 
> maybe that isn't that bad, but just wanted to make you all aware.
> do we need a vote on this, or do I just as I redeem best?

The parameter CONTEXT_DEFAULT_ENCODING is set in Constants.java - how
can I override this value?

> 
> 
> personally I think this patch should come together with a 
> change to our 
> web.xml so we rather change the default form-encoding to be 
> also "utf-8"

sorry, I don't understand this. Does this mean the general encoding is
iso-8859-1 and the form encoding is UTF-8? If yes, why two different
encodings?

Cheers,
Reinhard


Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.

Posted by Marc Portier <mp...@outerthought.org>.
OK,

Thx to Carsten's suggestions I have a patch for this that rougly looks like



1/ in src/java/org/apache/cocoon/Constants.java
   . add constant  CONTEXT_DEFAULT_ENCODING


2/ in  src/java/org/apache/cocoon/serialization/AbstractTextSerializer.java

   . add imports for Contextualizable
   . add interface to class declaration
   . use contextualize method to set default encoding to what is set in 
the context
   . note that the configure can still change it depending on the 
sitemap conf


3/ in src/java/org/apache/cocoon/servlet/CocoonServlet.java

   . in the init() we add the default encoding to the context as read 
from the servlet-initParameter "form-encoding"



now, since the last defaults to iso-8859-1 there is a bit of a 
side-effect to this patch which I introduced in my original posting



>>> * While at it, shouldn't we kinda default to UTF-8 anyway? even if 
>>> that is not the default encoding of the servlet-container? (some 
>>> gutfeeling argument: I think cocoon is closer to XML then to 
>>> servlet-containers?)
>>>

if I just apply the patch as described above the side-effect will be 
that the default-serialization for all our text-serializers (unless 
overriden by the config in the sitemap.xmap) will change from utf-8 
(more precisely: whatever xalan defaults to) to iso-8859-1


maybe that isn't that bad, but just wanted to make you all aware.
do we need a vote on this, or do I just as I redeem best?


personally I think this patch should come together with a change to our 
web.xml so we rather change the default form-encoding to be also "utf-8"

other opinions?


thx for any feedback
-marc=
-- 
Marc Portier                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0116284/
mpo@outerthought.org                              mpo@apache.org


RE: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.

Posted by Carsten Ziegeler <cz...@s-und-n.de>.
Marc Portier wrote:
> 
> However, what does this say about using cocoon outside the 
> servlet-context?
> 
> In every case: I like the idea of using the contextualize() but maybe it 
> makes more sense if we don't introduce a dependency between 
> AbstractTextSerializer and servlet.jar?
> 
Ok, that's true :(

> So maybe the Context directly should be augmented with access to this 
> kind of 'global' info?
> 
Yes, why not. Perhaps adding all parameters (either from web.xml or
cli) to the context?

Carsten

Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.

Posted by Marc Portier <mp...@outerthought.org>.

Carsten Ziegeler wrote:

> FYI, the Context (you get via Contextualizable) contains the ServletConfig
> via a constant defined in the CocoonServlet.
> This is something your two collegues were wondering about, but it might
> be that it helps you :)
> 

Yep, sounds like a way out of the stalemate I was facing..

However, what does this say about using cocoon outside the servlet-context?

In every case: I like the idea of using the contextualize() but maybe it 
makes more sense if we don't introduce a dependency between 
AbstractTextSerializer and servlet.jar?

So maybe the Context directly should be augmented with access to this 
kind of 'global' info?

-marc=

> Carsten
> 
> 
>>-----Original Message-----
>>From: Marc Portier [mailto:mpo@outerthought.org]
>>Sent: Friday, October 31, 2003 2:15 PM
>>To: dev@cocoon.apache.org
>>Subject: [heads up] cocoon's defaults form-encoding and
>>seerialize-encoding are inconsistent.
>>
>>
>>Hi all,
>>
>>we seem to have a smaal inconsistency concerning encoding of HTML forms
>>
>>- our HTML serializer by default is using the UTF-8 encoding.
>>(in fact it's set nowhere in the system and is thus left over to xalan 
>>which most likely is going down the easy path of assuming the default 
>>from XML land?)
>>
>>- not setting the form-encoding parameter in cocoon's web.xml defaults 
>>to assuming the browsers are sending the request params in the 
>>ISO-8859-1 encoding (CocoonServlet.java line 500)
>>
>>
>>Suggested fix:
>>I'ld like to get rid of any possible mismatch between both defaults and 
>>would like to propose to let the AbstractTextSerializer default to 
>>whatever the form-encoding is reading.
>>(still have to look how the configure() could have access to that info)
>>
>>
>>What do people think?
>>
>>
>>
>>Related discussions
>>
>>* While at it, shouldn't we kinda default to UTF-8 anyway? even if that 
>>is not the default encoding of the servlet-container? (some gutfeeling 
>>argument: I think cocoon is closer to XML then to servlet-containers?)
>>
>>* Why is the container-encoding also an init-param? isn't that fixed by 
>>the servlet 2.3 spec?
>>
>>
>>regards,
>>-marc=
>>-- 
>>Marc Portier                            http://outerthought.org/
>>Outerthought - Open Source, Java & XML Competence Support Center
>>Read my weblog at              http://radio.weblogs.com/0116284/
>>mpo@outerthought.org                              mpo@apache.org
>>
>>
> 
> 

-- 
Marc Portier                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0116284/
mpo@outerthought.org                              mpo@apache.org


RE: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.

Posted by Carsten Ziegeler <cz...@s-und-n.de>.
FYI, the Context (you get via Contextualizable) contains the ServletConfig
via a constant defined in the CocoonServlet.
This is something your two collegues were wondering about, but it might
be that it helps you :)

Carsten

> -----Original Message-----
> From: Marc Portier [mailto:mpo@outerthought.org]
> Sent: Friday, October 31, 2003 2:15 PM
> To: dev@cocoon.apache.org
> Subject: [heads up] cocoon's defaults form-encoding and
> seerialize-encoding are inconsistent.
> 
> 
> Hi all,
> 
> we seem to have a smaal inconsistency concerning encoding of HTML forms
> 
> - our HTML serializer by default is using the UTF-8 encoding.
> (in fact it's set nowhere in the system and is thus left over to xalan 
> which most likely is going down the easy path of assuming the default 
> from XML land?)
> 
> - not setting the form-encoding parameter in cocoon's web.xml defaults 
> to assuming the browsers are sending the request params in the 
> ISO-8859-1 encoding (CocoonServlet.java line 500)
> 
> 
> Suggested fix:
> I'ld like to get rid of any possible mismatch between both defaults and 
> would like to propose to let the AbstractTextSerializer default to 
> whatever the form-encoding is reading.
> (still have to look how the configure() could have access to that info)
> 
> 
> What do people think?
> 
> 
> 
> Related discussions
> 
> * While at it, shouldn't we kinda default to UTF-8 anyway? even if that 
> is not the default encoding of the servlet-container? (some gutfeeling 
> argument: I think cocoon is closer to XML then to servlet-containers?)
> 
> * Why is the container-encoding also an init-param? isn't that fixed by 
> the servlet 2.3 spec?
> 
> 
> regards,
> -marc=
> -- 
> Marc Portier                            http://outerthought.org/
> Outerthought - Open Source, Java & XML Competence Support Center
> Read my weblog at              http://radio.weblogs.com/0116284/
> mpo@outerthought.org                              mpo@apache.org
> 
>