You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Mark Lundquist <ml...@wrinkledog.com> on 2004/08/22 00:20:21 UTC

container-encoding vs. form-encoding... bug?

Hi,

I'm using Cocoon 2.1.5.1 w/ Jetty 4.2.15.  xalan was throwing a 
SAXException trying to write a character (U2026, &hellip) that's not 
reppresentable "in the specified output encoding iso-8859-1".

I made sure I had <xml:output encoding="UTF-8"> everywhere, but the 
problem persisted.  Finally I figured out that I needed to check the 
encoding parameters in web.xml.  Sure enough, container-encoding and 
form-encoding were not set, and the comments indicate that they default 
to iso-8859-1.

So I set the container-encoding to UTF-8, and that didn't have any 
effect.  Only when I set form-encoding to UTF-8 did my problem go away. 
  The thing is, the character that was causing the problem isn't coming 
from the request!  I expected container-encoding to be the one that 
would effect the behavior I was seeing.

So, am I just not understanding something correctly?  Or is it a bug, 
and if so is it a problem with Cocoon or with Jetty?

Cheers,
Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: container-encoding vs. form-encoding... bug?

Posted by Marc Portier <mp...@outerthought.org>.
this wiki article should explain everything:
http://wiki.apache.org/cocoon/RequestParameterEncoding


Mark Lundquist wrote:

> Hi,
> 
> I'm using Cocoon 2.1.5.1 w/ Jetty 4.2.15.  xalan was throwing a 
> SAXException trying to write a character (U2026, &hellip) that's not 
> reppresentable "in the specified output encoding iso-8859-1".
> 

probably somewhere in the serializer

> I made sure I had <xml:output encoding="UTF-8"> everywhere, but the 

to no avail (and I assume you wanted to type <xsl:output ...>  )

this directive is used by xalan if the 'xalan engine' is operating in a 
mode where it needs to transform AND serialize

cocoon however (having it's reasons to separate the two operations) will 
override this line in the xsl anyway... for cocoon the end result of a 
transformer needs to be pure sax-events that will be piped through a 
serializer later on

Since cocoon overrides that anyway you should use the <xsl:output ...> 
in your stylesheets to ease your debugging work so you can see the 
output of your stylesheet in your favourite encoding (and whatnot output 
params).

(for API geeks, see: 
http://java.sun.com/j2se/1.4.2/docs/api/javax/xml/transform/Transformer.html#setOutputProperties(java.util.Properties)

> problem persisted.  Finally I figured out that I needed to check the 
> encoding parameters in web.xml.  Sure enough, container-encoding and 
> form-encoding were not set, and the comments indicate that they default 
> to iso-8859-1.
> 
> So I set the container-encoding to UTF-8, and that didn't have any 
> effect.  Only when I set form-encoding to UTF-8 did my problem go away. 

container-encoding should be set to the encoding your chosen container 
(jetty) is using to decode (the body of) HTTP-requests

most container take iso-8859-1 here, so you should just leave it unless 
you know about your container doin' it differently

recent post learned that Jetty will allow you to set it yourself by 
specifying a system property -Dorg.mortbay.util.URI.charset=UTF-8
(see: http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=109273705513761&w=2 )

so only when playing with this, you should be getting into changing the 
container encoding in the web.xml

>  The thing is, the character that was causing the problem isn't coming 
> from the request!  I expected container-encoding to be the one that 
> would effect the behavior I was seeing.
> 

as you found out by now
container-encoding setting only comes into play when HTTP-request's body 
  is read in some way

> So, am I just not understanding something correctly?  Or is it a bug, 
> and if so is it a problem with Cocoon or with Jetty?
> 

what really needs to happen in this story is telling the SERIALIZER in 
cocoon about what encoding to use

it's quite logic: the <xsl:output ...> directive is overriden from the 
transformer part, so we need to inject that info back again, since this 
is about the serialization part of things you should give that info to 
the serializer. So how do you do that?

1/ You do that on a local level (one serializer) by applying the hints 
Jan just gave in his post. (setting map:serializer/@mime-type and 
./encoding)

2/ You do that on a global level (default for all text-serializers) by 
doing what you did: setting the form-encoding in web.xml.

Historically that setting comes into play also in the area of 
request-paramaters.  However there is a 'bug' (well, maybe rather a 
'historic way of interpreting' the specs) in most browsers that will 
make them apply the same form-encoding to their requests as the one 
applicable to the form asking for the reques-parameters.  Because of 
this client-side coupling we opted to make the applied form-encoding 
also be the default for our serializers.


HTH,
-marc=
-- 
Marc Portier                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at                http://blogs.cocoondev.org/mpo/
mpo@outerthought.org                              mpo@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: container-encoding vs. form-encoding... bug?

Posted by Jan Hoskens <jh...@schaubroeck.be>.
You should set the container-encoding to ISO-8859-1 and leave the 
form-encoding as UTF-8. If I remember correctly, the container-encoding 
is a thing introduced with servlet api 2.3 while cocoon was coping with 
2.2 . The latter did pass everything in ISO and cocoon expects it to be 
ISO (will change probably in later cocoon versions). Remember to set 
your encoding in your serializers too (two places to look for! The 
element "encoding" AND attribute "mime-type" ):


<map:serializer logger="sitemap.serializer.xhtml" mime-type="text/html; 
charset=UTF-8" name="xhtml" pool-grow="2" pool-max="64" pool-min="10" 
src="org.apache.cocoon.serialization.XMLSerializer">
                  <doctype-public>-//W3C//DTD XHTML 1.0 
Strict//EN</doctype-public>
                  
<doctype-system>http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</doctype-system>
                  <encoding>utf-8</encoding>
          </map:serializer>


Kind regards,
Jan

Mark Lundquist wrote:

> Hi,
>
> I'm using Cocoon 2.1.5.1 w/ Jetty 4.2.15.  xalan was throwing a 
> SAXException trying to write a character (U2026, &hellip) that's not 
> reppresentable "in the specified output encoding iso-8859-1".
>
> I made sure I had <xml:output encoding="UTF-8"> everywhere, but the 
> problem persisted.  Finally I figured out that I needed to check the 
> encoding parameters in web.xml.  Sure enough, container-encoding and 
> form-encoding were not set, and the comments indicate that they 
> default to iso-8859-1.
>
> So I set the container-encoding to UTF-8, and that didn't have any 
> effect.  Only when I set form-encoding to UTF-8 did my problem go 
> away.  The thing is, the character that was causing the problem isn't 
> coming from the request!  I expected container-encoding to be the one 
> that would effect the behavior I was seeing.
>
> So, am I just not understanding something correctly?  Or is it a bug, 
> and if so is it a problem with Cocoon or with Jetty?
>
> Cheers,
> Mark
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org