You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Sandeep N Kundu <c0...@geogr.uni-jena.de> on 2001/03/01 11:54:03 UTC

Unicode characters

Hi all,

I would like to know how to include unicode chatacters in my xml content.

e.g
compiling the following

<para>Thüringen</para>

gives an error "unicode character found in element para"

The unicode character being "ü"

Compiling after commenting the section containing the unicode character
like

<!--
<para>Thüringen</para> 
-->

Cocoon still doesn't compile and the error now is "unicode character found in comment"

Can anybody sugget how I can overcome this. I ´have to deal with german language in my documentation so can't get rid of the unicade characters

Please help

Sandeep

RE: Unicode characters

Posted by Tapan Nanawati <ta...@yahoo.com>.
Sandeep I hope you are using the following XML statement?
<?xml version="1.0" encoding="UTF-8"?>

specifying the encoding will help xml recognise the letters that you are
using. If "UTF-8" doesn't work you can try "UTF-16" or "ISO-8859" /
"IS-8859" (one of these ISO/IS - pl check from  book)

othersise I will give you something else..
Tapan Nanawati
tapan_nanawati@yahoo.com
91-11-6685274 (o)
91-98112-98982 (m)
New Delhi - INDIA
------------------------------
God is real, unless declared integer.

  -----Original Message-----
  From: Sandeep N Kundu [mailto:c0kusa@geogr.uni-jena.de]
  Sent: Thursday, March 01, 2001 4:24 PM
  To: cocoon-users@xml.apache.org
  Subject: Unicode characters


  Hi all,

  I would like to know how to include unicode chatacters in my xml content.

  e.g
  compiling the following

  <para>Thüringen</para>

  gives an error "unicode character found in element para"

  The unicode character being "ü"

  Compiling after commenting the section containing the unicode character
  like

  <!--
  <para>Thüringen</para>
  -->

  Cocoon still doesn't compile and the error now is "unicode character found
in comment"

  Can anybody sugget how I can overcome this. I ´have to deal with german
language in my documentation so can't get rid of the unicade characters

  Please help

  Sandeep

Re: Unicode characters

Posted by Werner Guttmann <We...@msdw.com>.
Hi Eduardo,

Eduardo Yánez wrote:

> > Hi,
> Hi Werner
>
> > I am just about to change encoding for some XSP
> > pages, too, as I need to
> > support Janapese clients as well.
> Do you need to support two encodings (western european
> & japanese) at the same time?.

Well, based on the the location of the client (e.g. London, New York,
Tokyo) I need to send the same output with a different character
encoding (e.g. 8859_1 for clients in London and New York, 8895 for
clients in Tokyo). For Tokyo, I definitely need to send Unicode
characters (8895) to their browsers.

> Honestly i don't know how to do that, in XML i have
> only one entry for the encoding in the XML file's
> prolog <?xml ...?> and in cocoon i only know about one
> place and one encoding in the cocoon.properties file.

That's what I gathered after reading through the documentation and this
thread yesterday. All my documents by now have the following header

<?xml version="1.0" encoding="UTF-8"?>

> > # This encoding should also be used in:
> > #   - The XSP document <?xml?> declaration
> > #   - The "encoding" configuration property of the
> > formatter to be used
> > # Example: Russian uses "Cp1251"
> > processor.xsp.encoding = UTF-8
> The cocoon.properties file says that this is the
> "default encondig", may be we can chage this in a
> producer.
>
> > and my documents now start with
> >
> > <?xml version="1.0" encoding="UTF-8"/>
> >
> > Bounced Tomcat, reloaded my application, and then I
> > looked at the output
> > of a sample page where I show the character encoding
> > of the page via the
> > <response:get-character-encoding /> element. To my
> > surprise, it still
> > shows 8859_1 which afair is the Western European
> > encoding. Any idea what's going wrong?
> Did you remove the cocoon cache where all the cocoon
> producers are stored?. Remove the cache in order to
> force cocoon to generate the producers again, i don't
> know but its possible that the old .class files were
> being executed?.

Yes, indeed. I found things far more stable when removing the repository
directory generated by Cocoon when bouncing the servlet container.

> Regards,
> Eduardo.
>
> _________________________________________________________
> Do You Yahoo!?
> Obtenga su dirección de correo-e gratis @yahoo.com
> en http://correo.espanol.yahoo.com


Re: Unicode characters

Posted by Eduardo Yánez <ej...@yahoo.com>.
> Hi,
Hi Werner

> I am just about to change encoding for some XSP
> pages, too, as I need to
> support Janapese clients as well. 
Do you need to support two encodings (western european
& japanese) at the same time?. 

Honestly i don't know how to do that, in XML i have
only one entry for the encoding in the XML file's
prolog <?xml ...?> and in cocoon i only know about one
place and one encoding in the cocoon.properties file. 

> # This encoding should also be used in:
> #   - The XSP document <?xml?> declaration
> #   - The "encoding" configuration property of the
> formatter to be used
> # Example: Russian uses "Cp1251"
> processor.xsp.encoding = UTF-8
The cocoon.properties file says that this is the
"default encondig", may be we can chage this in a
producer.

> and my documents now start with
> 
> <?xml version="1.0" encoding="UTF-8"/>
> 
> Bounced Tomcat, reloaded my application, and then I
> looked at the output
> of a sample page where I show the character encoding
> of the page via the
> <response:get-character-encoding /> element. To my
> surprise, it still
> shows 8859_1 which afair is the Western European
> encoding. Any idea what's going wrong?
Did you remove the cocoon cache where all the cocoon
producers are stored?. Remove the cache in order to
force cocoon to generate the producers again, i don't
know but its possible that the old .class files were
being executed?.
 
Regards,
Eduardo.

_________________________________________________________
Do You Yahoo!?
Obtenga su dirección de correo-e gratis @yahoo.com
en http://correo.espanol.yahoo.com

Re: Unicode characters

Posted by Werner Guttmann <We...@msdw.com>.
Hi,

I am just about to change encoding for some XSP pages, too, as I need to
support Janapese clients as well. I followed the instructions below (as
well as the ones provided in cocoon.properties) and made the relevant
changes.

# Default encoding to be used for code generation and
compilation
# If omitted, the platform's default encoding will be
used
# This encoding should also be used in:
#   - The XSP document <?xml?> declaration
#   - The "encoding" configuration property of the
formatter to be used
# Example: Russian uses "Cp1251"
processor.xsp.encoding = UTF-8

and my documents now start with

<?xml version="1.0" encoding="UTF-8"/>

Bounced Tomcat, reloaded my application, and then I looked at the output
of a sample page where I show the character encoding of the page via the
<response:get-character-encoding /> element. To my surprise, it still
shows

8859_1

which afair is the Western European encoding. Any idea what's going wrong
?

Thanks
Werner

Eduardo Yánez wrote:

> In the XSP processor section of the cocoon.properties:
>
> ##########################################
> # XSP Processor                          #
> ##########################################
>
> ....
> # Default encoding to be used for code generation and
> compilation
> # If omitted, the platform's default encoding will be
> used
> # This encoding should also be used in:
> #   - The XSP document <?xml?> declaration
> #   - The "encoding" configuration property of the
> formatter to be used
> # Example: Russian uses "Cp1251"
> processor.xsp.encoding = ISO-8859-1
>
> ....
>
> and of course, the enconding of your xml files!
>
> Bye,
> Eduardo
>
> --- Sandeep N Kundu <c0...@geogr.uni-jena.de>
> escribió: > Hi all,
> >
> > I would like to know how to include unicode
> > chatacters in my xml content.
> >
> > e.g
> > compiling the following
> >
> > <para>Thüringen</para>
> >
> > gives an error "unicode character found in element
> > para"
> >
> > The unicode character being "ü"
> >
> > Compiling after commenting the section containing
> > the unicode character
> > like
> >
> > <!--
> > <para>Thüringen</para>
> > -->
> >
> > Cocoon still doesn't compile and the error now is
> > "unicode character found in comment"
> >
> > Can anybody sugget how I can overcome this. I ´have
> > to deal with german language in my documentation so
> > can't get rid of the unicade characters
> >
> > Please help
> >
> > Sandeep
> >
>
> _________________________________________________________
> Do You Yahoo!?
> Obtenga su dirección de correo-e gratis @yahoo.com
> en http://correo.espanol.yahoo.com
>
> ---------------------------------------------------------------------
> Please check that your question has not already been answered in the
> FAQ before posting. <http://xml.apache.org/cocoon/faqs.html>
>
> To unsubscribe, e-mail: <co...@xml.apache.org>
> For additional commands, e-mail: <co...@xml.apache.org>


Re: Unicode characters

Posted by Eduardo Yánez <ej...@yahoo.com>.
In the XSP processor section of the cocoon.properties:

##########################################
# XSP Processor                          #
##########################################

....
# Default encoding to be used for code generation and
compilation
# If omitted, the platform's default encoding will be
used
# This encoding should also be used in:
#   - The XSP document <?xml?> declaration
#   - The "encoding" configuration property of the
formatter to be used
# Example: Russian uses "Cp1251"
processor.xsp.encoding = ISO-8859-1

....

and of course, the enconding of your xml files!

Bye, 
Eduardo

--- Sandeep N Kundu <c0...@geogr.uni-jena.de>
escribió: > Hi all,
> 
> I would like to know how to include unicode
> chatacters in my xml content.
> 
> e.g
> compiling the following
> 
> <para>Thüringen</para>
> 
> gives an error "unicode character found in element
> para"
> 
> The unicode character being "ü"
> 
> Compiling after commenting the section containing
> the unicode character
> like
> 
> <!--
> <para>Thüringen</para> 
> -->
> 
> Cocoon still doesn't compile and the error now is
> "unicode character found in comment"
> 
> Can anybody sugget how I can overcome this. I ´have
> to deal with german language in my documentation so
> can't get rid of the unicade characters
> 
> Please help
> 
> Sandeep
> 


_________________________________________________________
Do You Yahoo!?
Obtenga su dirección de correo-e gratis @yahoo.com
en http://correo.espanol.yahoo.com