You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Tricia Williams <pg...@student.cs.uwaterloo.ca> on 2006/08/04 22:02:10 UTC
Uppercase UTF-8 Diacritic characters
Hi All,
I was hoping that someone might be able to give me some insight. I am
using flowscript to redirect input from a form to another service which
gives me the xml results I want which can then be processed by subsequent
cocoon pipelines to provide the display I want. This works wonderfully
for small letters with diacritics such as the Latin Small Letter E With
Acute but the encoding does not seem to be maintained for its
corresponding upper-case symbol, Latin Capital Letter E With Acute. I am
certain that this encoding error occurs before the input to the other
service by isolating this part of the flow. To me this indicates that
something during or before the flowscript is causing this strange
behavior.
I am using a stand-alone local instance of cocoon 2.1.9 with java
1.5.0_07 and mozilla firefox 1.5.0.6 on Windows XP Professional 2002.
The following is the relevant parts of the sitemap:
<map:match pattern="search">
<map:act type="locale">
<map:generate type="file" src="cocoon:/peelsolruri" label="content">
<map:parameter name="tomcatPort" value="{global:tomcatPort}"/>
</map:generate>
....
</map:act>
</map:match>
<map:match pattern="peelsolruri">
<map:act type="set-encoding">
<map:parameter name="form-encoding" value="UTF-8"/>
</map:act>
<map:act type="locale">
<map:call function="peelsolruri">
<map:parameter name="tomcatPort" value="{global:tomcatPort}"/>
<map:parameter name="locale" value="{language}"/>
</map:call>
</map:act>
</map:match>
In the flowscript several parameters are inputed to a Java class and are
manipulated slightly as Java Strings to form the serviceURI. serviceURI
is then used in cocoon.redirectTo( serviceURI, false );
Using cocoon.log.info I compare the values of the parameter carrying the
capital diacritic characters before and after it is manipulated in the
Java class and its value appears to be the same during both states. The
output of the cocoon.log.info is the characters corresponding to the
partial utf-8 encoded value. What I mean by this is with the Latin Small
Letter E With Acute (UTF c3 a9) becomes Latin Capital Letter A With Tilde
(Unicode 00C3) and Copyright Sign (Unicode 00a9). The Latin Capital
Letter E With Acute (UTF c3 89) becomes Latin Capital Letter A With Tilde
(Unicode 00C3) and Question Mark (?). This is what appears in the log
file. In the case of the lower case value, this is interpreted correctly
(or as I want it to be) by the cocoon.redirect, but not so in the case of
the capital.
I'm quite puzzled by this behavior and am not sure where else to look for
a solution. If anyone has any specific/general advice or suggestions of
things to try, I would be grateful.
Thanks,
Tricia
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org