You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by da...@us.ibm.com on 2003/09/24 18:19:31 UTC

Re: Numeric entity problem




> Questions:
>
> 1. Is this a bug in xalan, from my point of view, it should leave the
numeric entity in the text payload untouched, since it is proper XML.

No, it's not a bug in Xalan.  Rather, it's a bug in your understanding of
XML processing.

> 2. If not, is there a way to disable this "feature" in Xalan, so that
these, perfectly legal numeric entities are let through in the
serialization

No, there is no way to disable this, because it's not a "feature" -- it's
required behavior.

> 3. If not, any sugestions on how to solve the problem?

You are trying to modify the content of a document to control aspects of
serialization, which will never work.  If you want pure ASCII
serialization, then create serializer with an encoding of "US-ASCII", and
the serializer will emit all non-ASCII characters as numeric character
references, which means you can skip step 5 of your process.

Dave

PS. Please don't cross-post!




                                                                                                                                                     
                      "Erik Ytterman"                                                                                                                
                      <erik.ytterman@ferr         To:      <xa...@xml.apache.org>, <xa...@xml.apache.org>                                
                      ologic.com>                 cc:      "'Beatrice Nilsson'" <be...@ferrologic.com>, (bcc: David N                     
                                                  Bertoni/Cambridge/IBM)                                                                             
                      09/24/2003 02:11 AM         Subject: Numeric entity problem                                                                    
                                                                                                                                                     



Dear All!

I'm struggling with a problem that needs to be solve as soon as possible.
Hope that you will be able to help me. I will attach parts of the code.

I'm doing the following:

1. Recive a callback with a proper XML document.
(DocumentHandler.handleDocument())

2. Use XPath to find the element to process
(DocumentHandler.translateDocument())

3. Find the text content of this element.
(DocumentHandler.translateDocument())

4. Translate the textual content of the element.
(OpenB2BUtil.translateString())

5. An ugly hack to transform any characters except ASCII into numeric
entities. (OpenB2BUtil.etitifyIsoString())

6. Replace the textual content of the element, including numeric entities
(DocumentHandler.translateDocument())

7. Serialize the resulting DOM tree using transformers
(OpenB2BUtil.documentToStream())

Problem:

As can be seen from the code, I replace the textual content of an element,
with a string that contains numeric entities (&#253;). My problem is that
the serialization seem to translate this into (&amp;#253;).

Questions:

1. Is this a bug in xalan, from my point of view, it should leave the
numeric entity in the text payload untouched, since it is proper XML.

2. If not, is there a way to disable this "feature" in Xalan, so that
these, perfectly legal numeric entities are let through in the
serialization

3. If not, any sugestions on how to solve the problem?

/Erik




 (See attached file: OpenB2BUtil.java)(See attached file:
DocumentHandler.java)