You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Ed Manners <Ed...@InfiniteAgent.com> on 2005/10/19 15:50:52 UTC

Support for Numeric entities refs in HTML output

Currently XALAN outputs character entity references when the output type is
HTML. This is particularily relevent when you are outputing extended
characters (umlauts, accents, etc). Is it possible to configure it to output
numeric entity references instead (through some property setting)? A
proprietary html client I working with doesn't support character entites.

I would expect some setting like xalan:entity_ref_type=numeric.
--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.344 / Virus Database: 267.12.4/142 - Release Date: 18/10/2005

Re: Support for Numeric entities refs in HTML output

Posted by Brian Minchau <mi...@ca.ibm.com>.
Stanimir is correct in saying that one can disable the HTML entities by
setting the Xalan specific transformer property:

     Transformer.setOutputProperty(
             "{http://xml.apache.org/xalan}entities", "");

This property is documented at
http://xml.apache.org/xalan-j/usagepatterns.html#outputprops

It is documented as a property on the Xalan serializer, and it mentions
modifying the properties through editing output_html.properties It also
mentions setting Xalan specific properties via the <xsl:output ...>
element. It doesn't mention setting Xalan specific properties via JAXP, but
this is most likely just a hole in the documentation, I'll raise a JIRA
issue and get this resolved one way or the other.

Since you are already skating on the thin ice of non-portability involved
with Xalan-J.

That webpage also says that you can provide your own properties file with
entities.  The default value for the entities property, for HTML output, is
    org/apache/xml/serializer/HTMLEntities

So you could set this proerty to your own properties file (class), rather
than the empty string, and in that properties file map characters to entity
names. If you look into org/apache/xml/serializer/HTMLEntities.properties
you will see:
#
# Character entity references for markup-significant
#
quot=34
amp=38
lt=60
gt=62
...

The character '<' with codepoint  60  will be written out as &lt;  etc.  If
you choose to create your own properties file I think you get the idea.

However, one should be very careful here. The less deep you go into the
internals of Xalan the better.
- Brian

- - - - - - - - - - - - - - - - - - - -
Brian Minchau
XSLT Development, IBM Toronto
e-mail:        minchau@ca.ibm.com


Re: Support for Numeric entities refs in HTML output

Posted by Stanimir Stamenkov <st...@myrealbox.com>.
/Stanimir Stamenkov/:
> /Ed Manners/:
>> -----Original Message-----
>> From: Stanimir Stamenkov [mailto:stanio@myrealbox.com]
>> Sent: 19 October 2005 14:58
>> To: xalan-j-users@xml.apache.org
>> Subject: Re: Support for Numeric entities refs in HTML output
>>
>>> http://xml.apache.org/xalan-j/usagepatterns.html#outputprops
>>>
>>> (look for 'xalan:entities' little down below)
>>
>> I have checked under both links for some clue, but nothing like this 
>> property (for turning on numeric entity references for HTML output) 
>> has come up.
> 
> Try setting that property to 'null' or empty 'java.util.Properties' object.

Ah, no. You could set it with empty string, more probably:

     Transformer.setOutputProperty(
             "{http://xml.apache.org/xalan}entities", "");

Or you could get the current output properties of a Transformer, 
remove the property and set the Transformer output properties again.

-- 
Stanimir


Re: Support for Numeric entities refs in HTML output

Posted by Stanimir Stamenkov <st...@myrealbox.com>.
/Ed Manners/:
> -----Original Message-----
> From: Stanimir Stamenkov [mailto:stanio@myrealbox.com]
> Sent: 19 October 2005 14:58
> To: xalan-j-users@xml.apache.org
> Subject: Re: Support for Numeric entities refs in HTML output
> 
>> http://xml.apache.org/xalan-j/usagepatterns.html#outputprops
>> 
>> (look for 'xalan:entities' little down below)
> 
> I have checked under both links for some clue, but nothing like this 
> property (for turning on numeric entity references for HTML output) has come 
> up.

Try setting that property to 'null' or empty 'java.util.Properties' 
object.

-- 
Stanimir


RE: Support for Numeric entities refs in HTML output

Posted by Ed Manners <Ed...@InfiniteAgent.com>.
I have checked under both links for some clue, but nothing like this
property (for turning on numeric entity references for HTML output) has come
up.
Thanks.

-----Original Message-----
From: Stanimir Stamenkov [mailto:stanio@myrealbox.com]
Sent: 19 October 2005 14:58
To: xalan-j-users@xml.apache.org
Subject: Re: Support for Numeric entities refs in HTML output


/Ed Manners/:

> Currently XALAN outputs character entity references when the output type
> is HTML. This is particularily relevent when you are outputing extended
> characters (umlauts, accents, etc). Is it possible to configure it to
> output numeric entity references instead (through some property
> setting)? A proprietary html client I working with doesn't support
> character entites.
>
> I would expect some setting like xalan:entity_ref_type=numeric.

You may try searching the mailing list archive:

http://marc.theaimsgroup.com/?l=xalan-j-users

And the answer to your question is documented on:

http://xml.apache.org/xalan-j/usagepatterns.html#outputprops

(look for 'xalan:entities' little down below)

--
Stanimir

--
No virus found in this incoming message.
Checked by AVG Anti-Virus.
Version: 7.0.344 / Virus Database: 267.12.4/142 - Release Date: 18/10/2005

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.344 / Virus Database: 267.12.4/142 - Release Date: 18/10/2005


Re: Support for Numeric entities refs in HTML output

Posted by Stanimir Stamenkov <st...@myrealbox.com>.
/Ed Manners/:

> Currently XALAN outputs character entity references when the output type 
> is HTML. This is particularily relevent when you are outputing extended 
> characters (umlauts, accents, etc). Is it possible to configure it to 
> output numeric entity references instead (through some property 
> setting)? A proprietary html client I working with doesn't support 
> character entites.
>  
> I would expect some setting like xalan:entity_ref_type=numeric.

You may try searching the mailing list archive:

http://marc.theaimsgroup.com/?l=xalan-j-users

And the answer to your question is documented on:

http://xml.apache.org/xalan-j/usagepatterns.html#outputprops

(look for 'xalan:entities' little down below)

-- 
Stanimir


RE: Support for Numeric entities refs in HTML output

Posted by Ed Manners <Ed...@InfiniteAgent.com>.
John,
Thanks for that. I'm not sure that works for my context.

But just to clarify, have you character references contained in the
stylesheet, but not in the source XML (to be transformed)?
e.g.

<!DOCTYPE xsl:stylesheet [
    <!ENTITY nbsp "&#x00A0;">
    ]>

<xsl:stylesheet version="1.0"
	xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" indent="no" omit-xml-declaration="yes"
encoding="ISO-8859-1"/>

<:xsl:template match="/">
	....
	<xsl:text>&npsp;</xsl:text>
</xsl:template>

Would it be possible to post a sample of the stylesheet you are using so I
can get a clearer understanding.

Thanks,
Ed.


-----Original Message-----
From: John Gentilin [mailto:gentijo@eyecatching.com]
Sent: 19 October 2005 18:32
To: EdManners@InfiniteAgent.com
Cc: xalan-j-users@xml.apache.org
Subject: Re: Support for Numeric entities refs in HTML output


Ed,

We have a different problem, and I am not sure we are even doing it
correctly,
but I think the results from our solutions will solve your problem. To
speed up
transformation time, we do not include DTD declarations. Now since we do
that we run into the problem that the XML parser won't parse our style
sheets
if we include some of the standard HTML entities. So what we do is add a
small
custom DTD to the top of our style sheet before the <xsl:stylesheet> element

<!DOCTYPE xsl:stylesheet [
    <!ENTITY nbsp "&#x00A0;">
    ]>

Now when we transform the stylesheet, the output is the binary
representation of the entity.
i.e. Depending on charset, 1 or 2 bytes and not an character version of
the entity or some
character encoding (i.e. a hex or decimal string such as  #xxxxxx).

So using the example above &nbsp; will be serialized as the single
binary character 0xA0.

HTH
John G


Ed Manners wrote:

>Thanks for the quick reply but its not that I want to choose between either
>decimal or hex, but I want to change its output behaviour so it outputs
>numeric character references rather then character entity references.
>
>Or is it that you are saying that because there is no design in switch for
>hex versus decimal, then my request is not feasible at this time.
>
>
>
>-----Original Message-----
>From: Joseph Kesselman [mailto:keshlam@us.ibm.com]
>Sent: 19 October 2005 15:08
>Cc: xalan-j-users@xml.apache.org
>Subject: Re: Support for Numeric entities refs in HTML output
>
>
>Minor terminology point: these character escapes are known as "numeric
>character references".
>
>As of last time I checked (some time ago, admittedly), we didn't have a
>designed-in switch for selecting decimal versus hex representation of
>character references. It wouldn't be an entirely unreasonable feature to
>add, if it can be done without performance impact.
>
>But I'd suggest fixing the other tool first, or as well, to accept both
>decimal and hex syntaxes. The next system it has to interoperate with may
>not be alterable...
>
>______________________________________
>Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
>"The world changed profoundly and unpredictably the day Tim Berners Lee
>got bitten by a radioactive spider." -- Rafe Culpin, in r.m.filk
>
>--
>No virus found in this incoming message.
>Checked by AVG Anti-Virus.
>Version: 7.0.344 / Virus Database: 267.12.4/142 - Release Date: 18/10/2005
>
>--
>No virus found in this outgoing message.
>Checked by AVG Anti-Virus.
>Version: 7.0.344 / Virus Database: 267.12.4/142 - Release Date: 18/10/2005
>
>


--
--------------------------------------
John Gentilin
Eye Catching Solutions Inc.
18314 Carlwyn Drive
Castro Valley CA 94546

    Contact Info
gentijo@eyecatching.com
Ca Office 1-510-881-4821
NJ Office 1-732-422-4917

--
No virus found in this incoming message.
Checked by AVG Anti-Virus.
Version: 7.0.344 / Virus Database: 267.12.4/143 - Release Date: 19/10/2005

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.344 / Virus Database: 267.12.4/143 - Release Date: 19/10/2005


Re: Support for Numeric entities refs in HTML output

Posted by John Gentilin <ge...@eyecatching.com>.
Ed,

We have a different problem, and I am not sure we are even doing it 
correctly,
but I think the results from our solutions will solve your problem. To 
speed up
transformation time, we do not include DTD declarations. Now since we do
that we run into the problem that the XML parser won't parse our style 
sheets
if we include some of the standard HTML entities. So what we do is add a 
small
custom DTD to the top of our style sheet before the <xsl:stylesheet> element

<!DOCTYPE xsl:stylesheet [
    <!ENTITY nbsp "&#x00A0;">
    ]>

Now when we transform the stylesheet, the output is the binary 
representation of the entity.
i.e. Depending on charset, 1 or 2 bytes and not an character version of 
the entity or some
character encoding (i.e. a hex or decimal string such as  #xxxxxx).

So using the example above &nbsp; will be serialized as the single 
binary character 0xA0.

HTH
John G


Ed Manners wrote:

>Thanks for the quick reply but its not that I want to choose between either
>decimal or hex, but I want to change its output behaviour so it outputs
>numeric character references rather then character entity references.
>
>Or is it that you are saying that because there is no design in switch for
>hex versus decimal, then my request is not feasible at this time.
>
>
>
>-----Original Message-----
>From: Joseph Kesselman [mailto:keshlam@us.ibm.com]
>Sent: 19 October 2005 15:08
>Cc: xalan-j-users@xml.apache.org
>Subject: Re: Support for Numeric entities refs in HTML output
>
>
>Minor terminology point: these character escapes are known as "numeric
>character references".
>
>As of last time I checked (some time ago, admittedly), we didn't have a
>designed-in switch for selecting decimal versus hex representation of
>character references. It wouldn't be an entirely unreasonable feature to
>add, if it can be done without performance impact.
>
>But I'd suggest fixing the other tool first, or as well, to accept both
>decimal and hex syntaxes. The next system it has to interoperate with may
>not be alterable...
>
>______________________________________
>Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
>"The world changed profoundly and unpredictably the day Tim Berners Lee
>got bitten by a radioactive spider." -- Rafe Culpin, in r.m.filk
>
>--
>No virus found in this incoming message.
>Checked by AVG Anti-Virus.
>Version: 7.0.344 / Virus Database: 267.12.4/142 - Release Date: 18/10/2005
>
>--
>No virus found in this outgoing message.
>Checked by AVG Anti-Virus.
>Version: 7.0.344 / Virus Database: 267.12.4/142 - Release Date: 18/10/2005
>  
>


-- 
--------------------------------------
John Gentilin
Eye Catching Solutions Inc.
18314 Carlwyn Drive
Castro Valley CA 94546

    Contact Info
gentijo@eyecatching.com
Ca Office 1-510-881-4821
NJ Office 1-732-422-4917


RE: Support for Numeric entities refs in HTML output

Posted by Ed Manners <Ed...@InfiniteAgent.com>.
Thanks for the quick reply but its not that I want to choose between either
decimal or hex, but I want to change its output behaviour so it outputs
numeric character references rather then character entity references.

Or is it that you are saying that because there is no design in switch for
hex versus decimal, then my request is not feasible at this time.



-----Original Message-----
From: Joseph Kesselman [mailto:keshlam@us.ibm.com]
Sent: 19 October 2005 15:08
Cc: xalan-j-users@xml.apache.org
Subject: Re: Support for Numeric entities refs in HTML output


Minor terminology point: these character escapes are known as "numeric
character references".

As of last time I checked (some time ago, admittedly), we didn't have a
designed-in switch for selecting decimal versus hex representation of
character references. It wouldn't be an entirely unreasonable feature to
add, if it can be done without performance impact.

But I'd suggest fixing the other tool first, or as well, to accept both
decimal and hex syntaxes. The next system it has to interoperate with may
not be alterable...

______________________________________
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Berners Lee
got bitten by a radioactive spider." -- Rafe Culpin, in r.m.filk

--
No virus found in this incoming message.
Checked by AVG Anti-Virus.
Version: 7.0.344 / Virus Database: 267.12.4/142 - Release Date: 18/10/2005

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.344 / Virus Database: 267.12.4/142 - Release Date: 18/10/2005


Re: Support for Numeric entities refs in HTML output

Posted by Joseph Kesselman <ke...@us.ibm.com>.
Minor terminology point: these character escapes are known as "numeric
character references".

As of last time I checked (some time ago, admittedly), we didn't have a
designed-in switch for selecting decimal versus hex representation of
character references. It wouldn't be an entirely unreasonable feature to
add, if it can be done without performance impact.

But I'd suggest fixing the other tool first, or as well, to accept both
decimal and hex syntaxes. The next system it has to interoperate with may
not be alterable...

______________________________________
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Berners Lee
got bitten by a radioactive spider." -- Rafe Culpin, in r.m.filk