You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by "Lopke, Michael" <mi...@hp.com> on 2005/01/21 18:19:56 UTC

ESQL and utf-8 encoding

Hi,

Has anyone here used esql with data that is utf-8 encoded?  I'm able to connect to my database and get the correct data but it appears that somewhere along the way the data is being interpreted as iso-8859-1 encoding.  I'm not sure if I got all of the configurations correct.  

For example, the Chinese character.
電

Shows up as this.
電å


In my sitemap.xmap I have the following:

<map:generators default="file">
         <map:generator label="content,data" logger="sitemap.generator.file" name="file" pool-grow="4" pool-max="32" pool-min="8" src="org.apache.cocoon.generation.FileGenerator"/>
         <map:generator label="content,data" logger="sitemap.generator.serverpages" name="xsp" pool-grow="2" pool-max="32" pool-min="4" src="org.apache.cocoon.generation.ServerPagesGenerator"/>
</map:generators>

…
     <map:serializers default="html">
...
         <map:serializer name="xml"
            src="org.apache.cocoon.serialization.XMLSerializer"
            mime-type="text/xml; charset=utf-8">
            <encoding>UTF-8</encoding>
         </map:serializer>
    </map:serializers>

….
<!--  the XSP pages -->
      <map:match pattern="*.xml">
         <map:generate type="xsp" src="xsp/{1}.xsp"/>
         <map:serialize type="xml"/>
      </map:match>

The snippit in my xsp file looks like this:
...
           <esql:results>
              <esql:row-results>
                 <data>
                    <esql:get-string column="display">
                   <esql:encoding>UTF-8</esql:encoding>
                   </esql:get-string >
                 </data>
              </esql:row-results>
            </esql:results>
…

It looks like the generator is interpreting the data as iso-8859-1 and passing it through the pipe as such.  If I take the same data and put it into an xml file as my source but modify the encoding at the top to iso-8859-1, I can duplicate the problem.

Thanks,
Mike Lopke