You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Peter Flynn <pf...@ucc.ie> on 2010/12/17 16:06:37 UTC
Encoding
I restored the Xalan settings after (failing to) add Saxon by copying
Emacs' ~ backup copies of cocoon.xconf and sitemap.xmap, but now
suddenly there are Unicode replacement characters (U+FFFD) appearing for
accents in pages which were working before.
The data is taken from a feed from an Oracle Application Server giving a
HTML <table> fragment, eg
http://rss.ucc.ie/live/w_rms_profile_list.show?p_school_id=A005
which dog and wget identify in the headers as
Content-Type: text/html; charset=WINDOWS-1252
(yes, I know, yuck...not my server)
[That URI may not be accessible off-campus]
This is processed by a pipeline to ensure it is XML:
<map:match pattern="people-in-schools/*">
<map:generate type="html"
src="http://rss.ucc.ie/dev/w_rms_profile_list.show?p_school_id={1}"/>
<map:serialize type="xml"/>
</map:match>
so that
http://publish.ucc.ie/researchprofiles/people-in-schools/A005
produces XML I can consume in my XSLT. However, this is appearing as:
<?xml version="1.0" encoding="ISO-8859-1"?><html...etc
depite the fact that the sitemap.xmap says very clearly:
<map:serializer logger="sitemap.serializer.xml"
mime-type="application/xml" name="xml"
src="org.apache.cocoon.serialization.XMLSerializer">
<encoding>UTF-8</encoding>
</map:serializer>
The result is that the output at
http://publish.ucc.ie/researchprofiles/A005
has Unicode replacement characters instead of accents.
I thought it should enforce translation to UTF-8 but obviously I have
missed something....but what?
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: Encoding
Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Laurent,
On 12/17/2010 11:25 AM, Laurent Medioni wrote:
> Have a look at http://wiki.apache.org/tomcat/FAQ/CharacterEncoding I
> think the comment you refer to tries to say that if no
> charset/encoding is set when producing a response then assume the
> ISO-8859-1 default value (do not ask why ;) ).
That wiki page explains why. I know because I wrote it :) It's all in
the servlet and HTTP specifications.
There's actually an open issue in Tomcat that proposes to switch the
default request body encoding /and/ URI encoding to UTF-8. Comments welcome:
https://issues.apache.org/bugzilla/show_bug.cgi?id=48550
- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk0P1FEACgkQ9CaO5/Lv0PCHMgCeIJ8Zt4DczFzMQA9ZFMd/ALiI
zvEAn2g14sxMECi+X7HaJ1y+X5FqXlV8
=pH7L
-----END PGP SIGNATURE-----
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
RE: Encoding
Posted by Laurent Medioni <lm...@odyssey-group.com>.
Setting container-encoding to UTF-8 enables you to share your servlet container (keeping its Latin1 default) with other applications not supporting UTF-8 (and not fiddling with encodings...), if relevant (we had the case...).
Have a look at http://wiki.apache.org/tomcat/FAQ/CharacterEncoding
I think the comment you refer to tries to say that if no charset/encoding is set when producing a response then assume the ISO8859-1 default value (do not ask why ;) ).
Alternatively try to do the equivalent of response.setContentType("text/html; charset=UTF-8") in your XSL (<xsl:output encoding='utf-8'/> ? sorry from memory, not an XSL specialist...), then you won't get the default encoding back.
Laurent
-----Original Message-----
From: Peter Flynn [mailto:pflynn@ucc.ie]
Sent: vendredi, 17. décembre 2010 17:06
To: users@cocoon.apache.org
Subject: Re: Encoding
On 17/12/10 15:37, Laurent Medioni wrote:
> What is your
> <init-param>
> <param-name>container-encoding</param-name>
> <param-value>UTF-8</param-value>
> </init-param>
> In web.xml ?
Interesting. ISO-8859-1, because
<!--
Set encoding used by the container. If not set the ISO-8859-1 encoding
will be assumed.
Since the servlet specification requires that the ISO-8859-1 encoding
is used (by default), you should never change this value unless
you have a buggy servlet container.
-->
I wouldn't call Tomcat buggy, exactly, but the servlet spec made a poor
choice in making ISO-8859-1 the default, given that the rest of the
planet is going down the UTF-{8|16|32|64} road :-)
Certainly fixes the problem though...very many thanks.
///Peter
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
____________________________________________________________
• This email and any files transmitted with it are CONFIDENTIAL and intended
solely for the use of the individual or entity to which they are addressed.
• Any unauthorized copying, disclosure, or distribution of the material within
this email is strictly forbidden.
• Any views or opinions presented within this e-mail are solely those of the
author and do not necessarily represent those of Odyssey Financial
Technologies SA unless otherwise specifically stated.
• An electronic message is not binding on its sender. Any message referring to
a binding engagement must be confirmed in writing and duly signed.
• If you have received this email in error, please notify the sender immediately
and delete the original.
Re: Encoding
Posted by Peter Flynn <pf...@ucc.ie>.
On 17/12/10 15:37, Laurent Medioni wrote:
> What is your
> <init-param>
> <param-name>container-encoding</param-name>
> <param-value>UTF-8</param-value>
> </init-param>
> In web.xml ?
Interesting. ISO-8859-1, because
<!--
Set encoding used by the container. If not set the ISO-8859-1 encoding
will be assumed.
Since the servlet specification requires that the ISO-8859-1 encoding
is used (by default), you should never change this value unless
you have a buggy servlet container.
-->
I wouldn't call Tomcat buggy, exactly, but the servlet spec made a poor
choice in making ISO-8859-1 the default, given that the rest of the
planet is going down the UTF-{8|16|32|64} road :-)
Certainly fixes the problem though...very many thanks.
///Peter
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
RE: Encoding
Posted by Laurent Medioni <lm...@odyssey-group.com>.
What is your
<init-param>
<param-name>container-encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
In web.xml ?
____________________________________________________________
• This email and any files transmitted with it are CONFIDENTIAL and intended
solely for the use of the individual or entity to which they are addressed.
• Any unauthorized copying, disclosure, or distribution of the material within
this email is strictly forbidden.
• Any views or opinions presented within this e-mail are solely those of the
author and do not necessarily represent those of Odyssey Financial
Technologies SA unless otherwise specifically stated.
• An electronic message is not binding on its sender. Any message referring to
a binding engagement must be confirmed in writing and duly signed.
• If you have received this email in error, please notify the sender immediately
and delete the original.
Re: Encoding
Posted by Peter Flynn <pf...@ucc.ie>.
On 17/12/10 15:06, Peter Flynn wrote:
[...]
> The result is that the output at
> http://publish.ucc.ie/researchprofiles/A005
> has Unicode replacement characters instead of accents.
Curiouser and curiouser, that page serves as UTF-8 but lower down it says:
<!DOCTYPE html
PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-ie">
<head xmlns="" xmlns:h="http://www.w3.org/1999/xhtml">
<meta http-equiv="Content-Type"
content="text/html; charset=ISO-8859-1">
<!--School: A005; Researcher-in-School: ; Real School: -->
<meta content="no-cache" http-equiv="Pragma">
That is generated by
<xsl:template match="h:head">
<head>
<xsl:comment>
<xsl:text>School: </xsl:text>
<xsl:value-of select="$school"/>
<xsl:text>; Researcher-in-School: </xsl:text>
<xsl:value-of select="$researcher-in-school"/>
<xsl:text>; Real School: </xsl:text>
<xsl:value-of select="$real-school"/>
</xsl:comment>
<meta http-equiv="Pragma" content="no-cache"/>
so WTF is that
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
coming from? Is Cocoon sticking it in by itself? The page template which
I take for the framework is
http://www.ucc.ie/en/old-design-base/
and that says quite clearly
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Something, somewhere is sticking a bogus encoding in the works.
///Peter
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org