You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by thorsten schmid <ts...@novocron.de> on 2002/07/12 14:17:16 UTC

encoding problem with xslt

Hello,

I have a problem regarding XSLT and the transformation of non-english 
iso-8859-1 encoded characters like the German "Umlaute" (e.g. &uuml; a double 
dotted u). I am using cocoon 2.1-dev, tomcat 4.0.1 with jdk1.3.1.. The 
transformation works fine as long as I don't use the <xsl:attribute ...> 
element.

================================================================
my xml:
<?xml version="1.0" encoding="ISO-8859-1"?>
<root>
             :
             :
   <c color="blue" sourcefile="foo.xml">Integrationsämter</c>
             :
             :
</root>
================================================================

================================================================
my xsl:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" omit-xml-declaration="yes" encoding="ISO-8859-1"/>
             :
             :
<xsl:template match='c[@color="blue"]'>
  <xsl:element name="a">
    <xsl:attribute  name="href">
     frameset.xsp?filename=<xsl:value-of   
select="@sourcefile"/>&amp;searchstring=<xsl:value-of   
disable-output-escaping="yes" select="."/>
    </xsl:attribute>
    <xsl:value-of select="."/>
  </xsl:element>
</xsl:template>
             :
             :
<xsl:stylesheet>
================================================================

================================================================
sitemap.xmap:
<?xml version="1.0"?>
<map:sitemap xmlns:map="http://apache.org/cocoon/sitemap/1.0">
             :
             :
    <map:transformers default="xslt">
      <map:transformer name="xslt"
        src="org.apache.cocoon.transformation.TraxTransformer">
         <encoding>ISO-8859-1</encoding>
      </map:transformer>
      <map:transformer name="xslt-with-parameters"
       src="org.apache.cocoon.transformation.TraxTransformer">
       <use-request-parameters>true</use-request-parameters>
       <encoding>ISO-8859-1</encoding>
      </map:transformer>
    </map:transformers>
             :
             :
</map:sitemap>
================================================================

================================================================
output:
<a href="frameset.xsp?filename=foo.xml&searchstring=Integrations%C3%A4mter">
Integrations&auml;mter
</a>               .
================================================================

================================================================
desired output:
<a href="frameset.xsp?filename=foo.xml&searchstring=Integrations&auml;mter"> 
Integrations&auml;mter
</a>               .
================================================================

As you can see the output is generated to be passed to an XSP file via 
HTTP-GET.
In cocoon 1.8.2 the desired output was generated.

What I have been trying so far:
- using the "disable-output-escaping" attribute in the <xsl: value-of ...> 
element -> no success
-setting the encoding in the <xsl:output ...> element -> no success
-setting the encoding in the sitemap <map:transformer ...> -> no success
-spelling the string 'iso-8859-1' in uppercase and lowerscase letters

Is there any possibility generating the desired output using the current 
version of cocoon?

Thanks in advance,
Thorsten
-- 

Thorsten Schmid

NovoCron Technologies
Am Steg 3, 89231 Neu-Ulm
Fon: +49-731-9723757
Fax: +49-731-9723818
Mobil: +49-170-3021585
mailto:tschmid@novocron.de
www.novocron.de


---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <co...@xml.apache.org>
For additional commands, e-mail:   <co...@xml.apache.org>


Re: encoding problem with xslt

Posted by Joerg Heinicke <jo...@gmx.de>.
Hello Thorsten,

there was a bug in Xalan with URL encoding more than a half year ago, but I 
don't know what's the current status.

> <xsl:template match='c[@color="blue"]'>
>   <xsl:element name="a">
>     <xsl:attribute  name="href">
>      frameset.xsp?filename=<xsl:value-of   
> select="@sourcefile"/>&amp;searchstring=<xsl:value-of   
> disable-output-escaping="yes" select="."/>
>     </xsl:attribute>
>     <xsl:value-of select="."/>
>   </xsl:element>
> </xsl:template>
> <xsl:stylesheet>

You can remove disable-output-escaping, because it has no effect in Cocoon 
and should not been used generally, because it's an optional function in XSLT.

Furthermore you can rewrite your code as

<a href="frameset.xsp?filename={@sourcefile}&amp;searchstring={.}">
   <xsl:value-of select="."/>
</a>

It's maybe more readable.


> ================================================================
> output:
> <a href="frameset.xsp?filename=foo.xml&searchstring=Integrations%C3%A4mter">
> Integrations&auml;mter
> </a>               .
> ================================================================

It looks not really bad to me. I don't know exactly to which %XX the a 
umlaut should be transformed correctly and whether to one %XX or two, but it 
looks not wrong.

> ================================================================
> desired output:
> <a href="frameset.xsp?filename=foo.xml&searchstring=Integrations&auml;mter"> 
> Integrations&auml;mter
> </a>               .
> ================================================================

This is definitely not correct. You can't use a entity in URL.

> - using the "disable-output-escaping" attribute in the <xsl: value-of ...> 
> element -> no success

deactivated in Cocoon

> -setting the encoding in the <xsl:output ...> element -> no success

deactivated in Cocoon, you do this in the sitemap as you did it correctly

> -setting the encoding in the sitemap <map:transformer ...> -> no success

Definitely not at <map:transformer>, but <map:serializer>. What I don't know 
is, whether it works at the pipe or only at <map:serializer> in <map:components>

> -spelling the string 'iso-8859-1' in uppercase and lowerscase letters

makes no difference, at least with Xalan.

> Is there any possibility generating the desired output using the current 
> version of cocoon?
> 
> Thanks in advance,
> Thorsten

Regards,

Joerg

-- 

System Development
VIRBUS AG
Fon  +49(0)341-979-7419
Fax  +49(0)341-979-7409
joerg.heinicke@virbus.de
www.virbus.de


---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <co...@xml.apache.org>
For additional commands, e-mail:   <co...@xml.apache.org>


Re: encoding problem with xslt

Posted by Jens Lorenz <je...@interface-projects.de>.
----- Original Message -----
From: "Joerg Heinicke" <jo...@gmx.de>
To: <co...@xml.apache.org>
Sent: Friday, July 12, 2002 5:23 PM
Subject: Re: encoding problem with xslt


> Hmm, I didn't test it a long time and didn't find a correlating bug by a
> short view on Xalan bug list. It was a bug in our application that "mü"
was
> "transformed" to "mü".
> (For people with different encoding: u umlaut ==> A+~ and 1/4.)
> I had in mind (and written in our bugzilla) that it was a Xalan bug,
maybe
> that's wrong. It sounds a bit like the description of the original post
on
> this thread. At least we solved it with POST form.
>
> Joerg
>


Thank you very much Joerg. This smells a lot like UTF-8 encoding.


<java>

  String s = new String("mü");
  byte[] data = s.getBytes("ISO-8859-1");
  String decoded = new String(data,"UTF-8");

  System.out.println(decoded);

</java>

gives mü. So Xalan encoded your string UTF-8. This is the
recommend encoding for URIs (see RFC 2718). So this is no
bug of Xalan.
And this leads to the real problem. URL being encoding
UTF-8 and servlet container encoding being ISO-8859-1.

Solution: ?



Jens

--

jens.lorenz at interface-projects dot de

interface:projects GmbH                             \\|//
Tolkewitzer Strasse 49                              (o o)
01277 Dresden                               ~~~~oOOo~(_)~oOOo~~~~
Germany


---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <co...@xml.apache.org>
For additional commands, e-mail:   <co...@xml.apache.org>


Re: encoding problem with xslt

Posted by Joerg Heinicke <jo...@gmx.de>.
Hmm, I didn't test it a long time and didn't find a correlating bug by a 
short view on Xalan bug list. It was a bug in our application that "mü" was 
"transformed" to "mü".
(For people with different encoding: u umlaut ==> A+~ and 1/4.)
I had in mind (and written in our bugzilla) that it was a Xalan bug, maybe 
that's wrong. It sounds a bit like the description of the original post on 
this thread. At least we solved it with POST form.

Joerg

Jens Lorenz wrote:
> ----- Original Message -----
> From: "Joerg Heinicke" <jo...@gmx.de>
> To: <co...@xml.apache.org>
> Sent: Friday, July 12, 2002 4:45 PM
> Subject: Re: encoding problem with xslt
> 
> 
> 
>>>If anyone has some more ideas on this topic (non-ISO-8859-1 characters
>>>within URIs), I would greatly appreciate some more input.
>>>Conclusion for me is to avoid such characters in URIs. But this does
>>>not get easily into the heads of our customers and users. (e.g. file
>>>names)
>>
>>According to the old Xalan bug, we used forms with javascript. But this
>>doesn't solve the problem generally, only our special use case.
>>
>>Joerg
> 
> Joerg,
> 
> 
> which Xalan bug are you referring to ? I browsed the list of open bugs,
> but only found related bugs.
> IMHO this is not a bug, but a lack of specification. W3C has a draft
> about an IRI (internationalized URI), but until this gets adopted
> and implemented we'll have to deal with the mess.
> 
> 
> 
> Jens


-- 

System Development
VIRBUS AG
Fon  +49(0)341-979-7419
Fax  +49(0)341-979-7409
joerg.heinicke@virbus.de
www.virbus.de


---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <co...@xml.apache.org>
For additional commands, e-mail:   <co...@xml.apache.org>


Re: encoding problem with xslt

Posted by Jens Lorenz <je...@interface-projects.de>.
----- Original Message -----
From: "Joerg Heinicke" <jo...@gmx.de>
To: <co...@xml.apache.org>
Sent: Friday, July 12, 2002 4:45 PM
Subject: Re: encoding problem with xslt


> > If anyone has some more ideas on this topic (non-ISO-8859-1 characters
> > within URIs), I would greatly appreciate some more input.
> > Conclusion for me is to avoid such characters in URIs. But this does
> > not get easily into the heads of our customers and users. (e.g. file
> > names)
>
> According to the old Xalan bug, we used forms with javascript. But this
> doesn't solve the problem generally, only our special use case.
>
> Joerg
>


Joerg,


which Xalan bug are you referring to ? I browsed the list of open bugs,
but only found related bugs.
IMHO this is not a bug, but a lack of specification. W3C has a draft
about an IRI (internationalized URI), but until this gets adopted
and implemented we'll have to deal with the mess.



Jens

--

jens.lorenz at interface-projects dot de

interface:projects GmbH                             \\|//
Tolkewitzer Strasse 49                              (o o)
01277 Dresden                               ~~~~oOOo~(_)~oOOo~~~~
Germany


---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <co...@xml.apache.org>
For additional commands, e-mail:   <co...@xml.apache.org>


Re: encoding problem with xslt

Posted by Joerg Heinicke <jo...@gmx.de>.
> If anyone has some more ideas on this topic (non-ISO-8859-1 characters
> within URIs), I would greatly appreciate some more input.
> Conclusion for me is to avoid such characters in URIs. But this does
> not get easily into the heads of our customers and users. (e.g. file
> names)

According to the old Xalan bug, we used forms with javascript. But this 
doesn't solve the problem generally, only our special use case.

Joerg

-- 

System Development
VIRBUS AG
Fon  +49(0)341-979-7419
Fax  +49(0)341-979-7409
joerg.heinicke@virbus.de
www.virbus.de


---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <co...@xml.apache.org>
For additional commands, e-mail:   <co...@xml.apache.org>


Re: encoding problem with xslt

Posted by Jens Lorenz <je...@interface-projects.de>.
----- Original Message ----- 
From: "Vadim Gritsenko" <va...@verizon.net>
To: <co...@xml.apache.org>
Sent: Friday, July 12, 2002 5:05 PM
Subject: RE: encoding problem with xslt


<snip/>

> > Thanks for you input Vadim. But do not only think of web sites. But
> > also of web applications. Think of a web cms for maintaining html
> > content.
> > Avoiding non-ISO characters ist
>                               ^^^
> :)

Ooops. This happens when somebody near you forces you to think
in two languages at once (one written, one spoken) ... sorry.
But now you know the german equivalent for "is" (if you didn't
know yet).

> > impossible for non-english web sites.
> > 
> > Fortunately POST method is immune to such issues. So the only option
> > is to carry these characters via POST back to Cocoon.
> 
> It is also the safest way and compatible among all browsers/platforms
> (if platform supports this encoding, of course).
> 
> 
> Take care,
> 
> Vadim
> 

Jens

-- 

jens.lorenz at interface-projects dot de

interface:projects GmbH                             \\|//
Tolkewitzer Strasse 49                              (o o)
01277 Dresden                               ~~~~oOOo~(_)~oOOo~~~~
Germany


---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <co...@xml.apache.org>
For additional commands, e-mail:   <co...@xml.apache.org>


embedded svg

Posted by Peter Sparkes <pe...@didm.co.uk>.
I have an xsl.fo file with embedded svg.

I want to convert the file to pdf.

It works fine without the svg.

It works fine with stand alone FOP

When I run it in cocoon 2.1-dev I get nothing, no errors just a blank
screen.


Help please.

Peter



---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <co...@xml.apache.org>
For additional commands, e-mail:   <co...@xml.apache.org>


RE: encoding problem with xslt

Posted by Vadim Gritsenko <va...@verizon.net>.
> From: Jens Lorenz [mailto:jens.lorenz@interface-projects.de]
> 
> ----- Original Message -----
> From: "Vadim Gritsenko" <va...@verizon.net>
> To: <co...@xml.apache.org>
> Sent: Friday, July 12, 2002 4:36 PM
> Subject: RE: encoding problem with xslt
> 
> 
> > > From: Jens Lorenz [mailto:jens.lorenz@interface-projects.de]
> > ...
> > > If anyone has some more ideas on this topic (non-ISO-8859-1
characters
> > > within URIs), I would greatly appreciate some more input.
> >
> > IMHO, non-ascii characters in URIs should be avoided by all means
> > possible.
> >
> >
> > Less issues for you *and* for visitors of your site. Issues with
> > non-ascii characters in URIs are endless (I bet you have not thought
> > about visitors of your site exchanging bookmarks/URIs, and their
systems
> > have different encodings)
> >
> >
> > Vadim
> 
> 
> Thanks for you input Vadim. But do not only think of web sites. But
> also of web applications. Think of a web cms for maintaining html
> content.
> Avoiding non-ISO characters ist
                              ^^^
:)


> impossible for non-english web sites.
> 
> Fortunately POST method is immune to such issues. So the only option
> is to carry these characters via POST back to Cocoon.

It is also the safest way and compatible among all browsers/platforms
(if platform supports this encoding, of course).


Take care,

Vadim


> Jens
> 
> --
> 
> jens.lorenz at interface-projects dot de
> 
> interface:projects GmbH                             \\|//
> Tolkewitzer Strasse 49                              (o o)
> 01277 Dresden                               ~~~~oOOo~(_)~oOOo~~~~
> Germany


---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <co...@xml.apache.org>
For additional commands, e-mail:   <co...@xml.apache.org>


Re: encoding problem with xslt

Posted by Jens Lorenz <je...@interface-projects.de>.
----- Original Message -----
From: "Vadim Gritsenko" <va...@verizon.net>
To: <co...@xml.apache.org>
Sent: Friday, July 12, 2002 4:36 PM
Subject: RE: encoding problem with xslt


> > From: Jens Lorenz [mailto:jens.lorenz@interface-projects.de]
> >
> ...
> > If anyone has some more ideas on this topic (non-ISO-8859-1 characters
> > within URIs), I would greatly appreciate some more input.
>
> IMHO, non-ascii characters in URIs should be avoided by all means
> possible.
>
>
> Less issues for you *and* for visitors of your site. Issues with
> non-ascii characters in URIs are endless (I bet you have not thought
> about visitors of your site exchanging bookmarks/URIs, and their systems
> have different encodings)
>
>
> Vadim


Thanks for you input Vadim. But do not only think of web sites. But
also of web applications. Think of a web cms for maintaining html
content.
Avoiding non-ISO characters ist impossible for non-english web sites.

Fortunately POST method is immune to such issues. So the only option
is to carry these characters via POST back to Cocoon.


Jens

--

jens.lorenz at interface-projects dot de

interface:projects GmbH                             \\|//
Tolkewitzer Strasse 49                              (o o)
01277 Dresden                               ~~~~oOOo~(_)~oOOo~~~~
Germany


---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <co...@xml.apache.org>
For additional commands, e-mail:   <co...@xml.apache.org>


RE: encoding problem with xslt

Posted by Vadim Gritsenko <va...@verizon.net>.
> From: Jens Lorenz [mailto:jens.lorenz@interface-projects.de]
> 
...
> If anyone has some more ideas on this topic (non-ISO-8859-1 characters
> within URIs), I would greatly appreciate some more input.

IMHO, non-ascii characters in URIs should be avoided by all means
possible.


Less issues for you *and* for visitors of your site. Issues with
non-ascii characters in URIs are endless (I bet you have not thought
about visitors of your site exchanging bookmarks/URIs, and their systems
have different encodings)


Vadim


> Conclusion for me is to avoid such characters in URIs. But this does
> not get easily into the heads of our customers and users. (e.g. file
> names)
> 
> 
> 
> Best Regards,
> 
> 
> Jens
> 
> --
> 
> jens.lorenz at interface-projects dot de
> 
> interface:projects GmbH                             \\|//
> Tolkewitzer Strasse 49                              (o o)
> 01277 Dresden                               ~~~~oOOo~(_)~oOOo~~~~
> Germany


---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <co...@xml.apache.org>
For additional commands, e-mail:   <co...@xml.apache.org>


Re: encoding problem with xslt

Posted by Jens Lorenz <je...@interface-projects.de>.
----- Original Message -----
From: "thorsten schmid" <ts...@novocron.de>
To: <co...@xml.apache.org>
Cc: <tw...@novocron.de>
Sent: Friday, July 12, 2002 2:17 PM
Subject: encoding problem with xslt


Hi Thorsten,

<snip/>

> ================================================================
> output:
> <a
href="frameset.xsp?filename=foo.xml&searchstring=Integrations%C3%A4mter">
> Integrations&auml;mter
> </a>               .
> ================================================================
>


This output is certainly correct. URIs generated via HTML output
method of Xalan or Saxon are UTF-8 encoded. (and äöü are 2 bytes wide
when using UTF-8) This is recommended by RFC 2718.

The problem is not Cocoon, but the Servlet-Spec. Tomcats default
encoding is ISO-8859-1. So your URI ist decoded with ISO-8859-1.
This obviously breaks your Cocoon servlet later.
Since HTTP protocol does not send encoding with the URI, there is
also no chance for Tomcat to detect the encoding of the URI. And
even worse Request.setEncoding() affects only parameters (GET request,
POST is immune, since during a POST the encoding is send by the browser)


You have three options:

Set encoding of your servlet container to UTF-8. For Tomcat you do this
by setting CATALINA_OPTS to "-Dfile.encoding=UTF-8". But beware, that
this might break your existing plain text files, which are most probably
ISO-8859-1 encoded. With XML files this is no problem, as long as you
specify their encoding correctly.

Second option is to manually recode the URI within Cocoon via some
custom code. But this is somewhat "hacky".

Third option and probably best, is to use a Servlet filter in front
of Cocoon which does the transformation of character encodings for
you. This way, you don't have to break text files read and written
by the Tomcat JVM, and you can still use full UTF-8 within your URIs.



If anyone has some more ideas on this topic (non-ISO-8859-1 characters
within URIs), I would greatly appreciate some more input.
Conclusion for me is to avoid such characters in URIs. But this does
not get easily into the heads of our customers and users. (e.g. file
names)



Best Regards,


Jens

--

jens.lorenz at interface-projects dot de

interface:projects GmbH                             \\|//
Tolkewitzer Strasse 49                              (o o)
01277 Dresden                               ~~~~oOOo~(_)~oOOo~~~~
Germany


---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <co...@xml.apache.org>
For additional commands, e-mail:   <co...@xml.apache.org>