You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Tricia Williams <pg...@student.cs.uwaterloo.ca> on 2006/08/04 22:02:10 UTC

Uppercase UTF-8 Diacritic characters

Hi All,

    I was hoping that someone might be able to give me some insight.  I am 
using flowscript to redirect input from a form to another service which 
gives me the xml results I want which can then be processed by subsequent 
cocoon pipelines to provide the display I want.  This works wonderfully 
for small letters with diacritics such as the Latin Small Letter E With 
Acute but the encoding does not seem to be maintained for its 
corresponding upper-case symbol, Latin Capital Letter E With Acute.  I am 
certain that this encoding error occurs before the input to the other 
service by isolating this part of the flow.  To me this indicates that 
something during or before the flowscript is causing this strange 
behavior.

    I am using a stand-alone local instance of cocoon 2.1.9 with java 
1.5.0_07 and mozilla firefox 1.5.0.6 on Windows XP Professional 2002.
    The following is the relevant parts of the sitemap:

<map:match pattern="search">
   <map:act type="locale">
     <map:generate type="file" src="cocoon:/peelsolruri" label="content">
       <map:parameter name="tomcatPort" value="{global:tomcatPort}"/>
     </map:generate>
     ....
   </map:act>
</map:match>

<map:match pattern="peelsolruri">
   <map:act type="set-encoding">
     <map:parameter name="form-encoding" value="UTF-8"/>
   </map:act>
   <map:act type="locale">
     <map:call function="peelsolruri">
       <map:parameter name="tomcatPort" value="{global:tomcatPort}"/>
       <map:parameter name="locale" value="{language}"/>
     </map:call>
   </map:act>
</map:match>

In the flowscript several parameters are inputed to a Java class and are 
manipulated slightly as Java Strings to form the serviceURI.  serviceURI 
is then used in cocoon.redirectTo( serviceURI, false );

Using cocoon.log.info I compare the values of the parameter carrying the 
capital diacritic characters before and after it is manipulated in the 
Java class and its value appears to be the same during both states.  The 
output of the cocoon.log.info is the characters corresponding to the 
partial utf-8 encoded value.  What I mean by this is with the Latin Small 
Letter E With Acute (UTF c3 a9) becomes Latin Capital Letter A With Tilde 
(Unicode 00C3) and Copyright Sign (Unicode 00a9).  The Latin Capital 
Letter E With Acute (UTF c3 89) becomes Latin Capital Letter A With Tilde 
(Unicode 00C3) and Question Mark (?).  This is what appears in the log 
file.  In the case of the lower case value, this is interpreted correctly 
(or as I want it to be) by the cocoon.redirect, but not so in the case of 
the capital.

I'm quite puzzled by this behavior and am not sure where else to look for 
a solution.  If anyone has any specific/general advice or suggestions of 
things to try, I would be grateful.

Thanks,
Tricia

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org