You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by Daniel Lopez <D....@uib.es> on 2000/09/26 10:22:19 UTC

Another bug with encoding="..."

Hi,

I've found another bug regarding the use of encoding="iso-8859-1". It's
NOT the old bug regarding value-of... vs. {@...}. I already had to fight
against this one and it was solved in Xalan 1.2D02 (Thanks Paul for your
help) ;).
But now I found another one. It's an odd one because it doesn't always
show but I have been able to create a test case. The problem is the
following: Sometimes, when I use <xsl:value-of ...> inside a <script>
tag, the special characters are not translated properly, as if the
encoding was not being taken into account. I don't know if this happens
in other kind of tags and I haven't been able to find the reason why it
happens with the script  tag and not with other tags. In any case, here
it is the test case:

test.xml
------------------------------------------------
<?xml version="1.0" encoding="iso-8859-1"?>
<TEST VALUE="áéíóúñç"/>

test.xslt
------------------------------------------------
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
	<xsl:output method="html" indent="yes"/>
	<xsl:template match="/">
		<html>
			<script language="JavaScript">
			//
			function TestFunc()
			{
			 content =  ' ' +
				<xsl:apply-templates select="TEST"/>
				' end.'
			}
			//
			</script>
			<body>
				<xsl:apply-templates select="TEST"/>
			</body>
		</html>
	</xsl:template>
	<xsl:template match="TEST">
	'Value is <xsl:value-of select="@VALUE"/>' + 
	</xsl:template>
</xsl:stylesheet>

Result.html
------------------------------------------------
<html>
    <script language="JavaScript">
			//
			function TestFunc()
			{
			 content =  ' ' +
				
--1-->	'Value is áéíóúñç' + 
	
				' end.'
			}
			//
			</script>
	<body bgcolor="#FFFFFF">
--2-->	'Value is
&aacute;&eacute;&iacute;&oacute;&uacute;&ntilde;&ccedil;' + 
	</body>
</html>


Note that the first time(--1-->), inside the script tag, the value is
translated incorrectly and the second time it is translated
correctly(--2-->). This doesn't happen if I use other tag than <script>.

Am I doing something wrong? Is the script tag treated in different way
for any reason?

Environment:
J-Xalan 1.2D02 (and xerces included with it)
Win NT
JDK1.3

Thanks in advance,
Dan
-------------------------------------------
Daniel Lopez Janariz (D.Lopez@uib.es)
Web Services
Computer Center
Balearic Islands University
-------------------------------------------

Re: Another bug with encoding="..."

Posted by Daniel Lopez <D....@uib.es>.
Hi again,

> There are two encodings at work in the stylesheet.  The encoding
> pseudoattribute
> in the XML Declaration at the beginning of the stylesheet specifies the
> encoding
> used for the stylesheet itself.  In your original email, I do see that
> you specified
> this.

You are absolutely right. I must admit that I thought there was just one
place
to specify the encoding but now I see that one is for the XSL sheet
encoding and
another one for the encoding of the output. Thanks for the
clarification.

> However, there is _also_ an encoding attribute for the xsl:output
> element.  That is
> the one that I mentioned in my previous reply.  This did NOT show up in
> your example.
> This attribute covers the encoding to be used for the OUTPUT document.
> If you add this,
> you'll see that the content exactly matches what is shown in your input
> XML when
> in the <script> tag.
> The output encoding is _not_ being ignored in the <script> tag.  Since
> you did not specify the
> encoding attribute on the xsl:output element, it defaulted to UTF-8 and
> this is the encoding that you see.  If you change your stylesheet
> xsl:output element to explicity specify encoding="iso-8859-1", you'll
> see your correct characters inside the script.  Inside the script tag,
> characters outside the ASCII range are _not_ changed to character entity
> references.  This is special behavior for the script tag and is usually
> appropriate.  Is there a problem with having the literal value shown in
> the script portion of your output html document?  What would you like to
> see there?

Yes, I included it in my document and then áéíóúñç was shown as áéíóúñç
as you said.
However, the output method is html so I was expecting the result to be
&aacute;&eacute;&iacute;&oacute;&uacute;&ntilde;&ccedil;. Out of
curiosity, why it is
so that the script tag is so different from other tags? My problem is
that as it is
used in html to specify the Javascript blocks, whenever I generate
javascript functions
from the XML, I might get into trouble. And from my personal point of
view: Even though
it is not a bug,  isn't this "just the script tag behaves like that" a
probable source
of "phantom" problems? Wouldn't it be better for all the tags to behave
coherently? I
guess that as I don't know the reason why script is special, I don't see
the point for
its special behaviour.
 
> > Just to complicate things a little more {@...} works fine whereas
> > <xsl-value-of ...> does NOT. I hope this is not the "designed behaviour"
> > ;). Or is the <script> tag special for a reason I should be aware of?
> 
> I'm not sure what you mean here.  In your TEST template, I added the
> line:
> 
>         <debugelement myattr="{@VALUE}" />
> 
> It showed up in the output with the literal characters.  Is this what
> you mean?
> Gary

No, that was my own fault because it was working for me in another
stylesheet. I
checked again and I saw that it was working fine because it was enclosed
in another
tag. So I had inadvertedly used my own workaround. I'm sorry for the
confusion.

Many thanks again,
Dan

Re: Another bug with encoding="..."

Posted by Gary L Peskin <ga...@firstech.com>.
Daniel Lopez wrote:
> 
> Hi Gary,

  Hi.

> 
> Eummmm, I might have not expressed myself quite well. I already have
> specified encoding="iso-8859-1" and method="html" in the XML file and in
> the XSLT sheet.

There are two encodings at work in the stylesheet.  The encoding
pseudoattribute
in the XML Declaration at the beginning of the stylesheet specifies the
encoding
used for the stylesheet itself.  In your original email, I do see that
you specified
this.

However, there is _also_ an encoding attribute for the xsl:output
element.  That is
the one that I mentioned in my previous reply.  This did NOT show up in
your example.
This attribute covers the encoding to be used for the OUTPUT document. 
If you add this,
you'll see that the content exactly matches what is shown in your input
XML when
in the <script> tag.

> My problem is that for some reason it is being ignored
> JUST inside the <script> tag.
> If you look at the test case I sent in my first mail, you'll see that it
> is working as expected in one place and then failing in another place
> (namely inside the <script> tag).

The output encoding is _not_ being ignored in the <script> tag.  Since
you did not specify the
encoding attribute on the xsl:output element, it defaulted to UTF-8 and
this is the encoding that you see.  If you change your stylesheet
xsl:output element to explicity specify encoding="iso-8859-1", you'll
see your correct characters inside the script.  Inside the script tag,
characters outside the ASCII range are _not_ changed to character entity
references.  This is special behavior for the script tag and is usually
appropriate.  Is there a problem with having the literal value shown in
the script portion of your output html document?  What would you like to
see there?

> Just to complicate things a little more {@...} works fine whereas
> <xsl-value-of ...> does NOT. I hope this is not the "designed behaviour"
> ;). Or is the <script> tag special for a reason I should be aware of?

I'm not sure what you mean here.  In your TEST template, I added the
line:

	<debugelement myattr="{@VALUE}" />

It showed up in the output with the literal characters.  Is this what
you mean?

Gary

> 
> Gary L Peskin wrote:
> >
> > Daniel Lopez wrote:
> > > Thanks for your help Gary, does this mean you fixed it or that you
> > > discovered the problem and somebody will fix it?
> >
> > Actually, I don't think this is a bug at all.  This seems to be "working
> > as designed".
> >
> > Try changing the xsl:output element in your stylesheet to
> >
> >         <xsl:output method="html" indent="yes" encoding="iso-8859-1"/>
> >
> > I think adding the encoding attribute will give you the results you are
> > looking for.
> >
> > BTW, I think your workaround is quite clever.
> >
> > Gary

Re: Another bug with encoding="..."

Posted by Daniel Lopez <D....@uib.es>.
Hi Gary,

Eummmm, I might have not expressed myself quite well. I already have
specified encoding="iso-8859-1" and method="html" in the XML file and in
the XSLT sheet. My problem is that for some reason it is being ignored
JUST inside the <script> tag.
If you look at the test case I sent in my first mail, you'll see that it
is working as expected in one place and then failing in another place
(namely inside the <script> tag).
Just to complicate things a little more {@...} works fine whereas
<xsl-value-of ...> does NOT. I hope this is not the "designed behaviour"
;). Or is the <script> tag special for a reason I should be aware of?
Thanks for your comments and interest,
Dan

Gary L Peskin wrote:
> 
> Daniel Lopez wrote:
> > Thanks for your help Gary, does this mean you fixed it or that you
> > discovered the problem and somebody will fix it?
> 
> Actually, I don't think this is a bug at all.  This seems to be "working
> as designed".
> 
> Try changing the xsl:output element in your stylesheet to
> 
>         <xsl:output method="html" indent="yes" encoding="iso-8859-1"/>
> 
> I think adding the encoding attribute will give you the results you are
> looking for.
> 
> BTW, I think your workaround is quite clever.
> 
> Gary

Re: Another bug with encoding="..."

Posted by Gary L Peskin <ga...@firstech.com>.
Daniel Lopez wrote:
> Thanks for your help Gary, does this mean you fixed it or that you
> discovered the problem and somebody will fix it?

Actually, I don't think this is a bug at all.  This seems to be "working
as designed".

Try changing the xsl:output element in your stylesheet to

	<xsl:output method="html" indent="yes" encoding="iso-8859-1"/>

I think adding the encoding attribute will give you the results you are
looking for.

BTW, I think your workaround is quite clever.

Gary

Re: Another bug with encoding="..."

Posted by Daniel Lopez <D....@uib.es>.
Hi again,

Thanks for your help Gary, does this mean you fixed it or that you
discovered the problem and somebody will fix it?
Anyway, thanks to your confirmation of the weirdness of the behaviour
inside the <script> tag, I kept on investigating and discovered a
workaround and I wanted to show to everybody so people can still work
while waiting for the fix. It's the following:
As the problem just shows when you use <xsl-value-of...> directly inside
the script tag, you just have to add some kind of useless tag around
your <xsl-value-of...> call and the output is formatted correctly.
For example, from my previous test case:
------------------------------
<script language="JavaScript">
//
function TestFunc()
{
content =  ' ' +
           <xsl:apply-templates select="TEST"/>
           ' end.'
}
//
</script>
------------------------------
might turn into
------------------------------
<script language="JavaScript">
//
function TestFunc()
{
dummy = '<none>'
content =  ' ' +
           <xsl:apply-templates select="TEST"/>
           ' end.'
dummy = '</none>'
}
//
</script>
------------------------------

and then it works.
I hope this helps
Regards, and thanks again to Gary,
Dan

PD: Does anybody know in which release the fix will be included?

Gary L Peskin wrote:
> 
> Gary L Peskin wrote:
> > I don't know enough about encodings to understand why the display of the
> > characters is changed from the representation on the input.
> 
> Well, I just did some quick work on this and I figured out that the
> output was
> the UTF-8 representation of the input while inside the script.
> 
> Gary

Re: Another bug with encoding="..."

Posted by Gary L Peskin <ga...@firstech.com>.
Gary L Peskin wrote:
> I don't know enough about encodings to understand why the display of the
> characters is changed from the representation on the input.

Well, I just did some quick work on this and I figured out that the
output was 
the UTF-8 representation of the input while inside the script.

Gary

Re: Another bug with encoding="..."

Posted by Gary L Peskin <ga...@firstech.com>.
Daniel Lopez wrote:
> 
> Hi,
> 
> I've found another bug regarding the use of encoding="iso-8859-1".
> [snip]
> Am I doing something wrong? Is the script tag treated in different way
> for any reason?

I haven't looked into this to deeply but I did find that the script tag
is treated 
differently.  If you look in the source at
org.apache.xalan.xpath.xml.FormatterToHTML, there is a static method
that sets up the m_elementFlags table entries.  The entry for the
<script> tag contains a RAW attribute which, I think, is causing the
characters to be copied "as is" without conversion to entities.

I don't know enough about encodings to understand why the display of the
characters is changed from the representation on the input.  But this
should confirm that the character representation inside the script tag
is treated differently from that in other tags.

HTH,
Gary