You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@xml.apache.org by Jacob Kjome <ho...@visi.com> on 2006/04/08 07:32:28 UTC

why is entity ref expanded in the internal subset (and related questions)?...

I'm having a problem with the following...

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE vxml SYSTEM "voicexml1-0.dtd" [
	<!ENTITY % BigEntity SYSTEM "BigEntity.ent">
	%BigEntity;
]>
<vxml version="1.0">
	<form id="init">
		<block>
			&BigEntity;
		</block>
	</form>
</vxml>

BigEntity.ent looks something like this...

<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY BigEntity '
		<prompt>another prompt</prompt>
		<prompt>another prompt</prompt>
		<prompt>another prompt</prompt>
		<prompt>another prompt</prompt>
		<prompt>another prompt</prompt>
		<prompt>another prompt</prompt>
		<prompt>another prompt</prompt>
'>

When internalEntityDecl(String name, XMLString text, XMLString 
nonNormalizedText, Augmentations augs) gets called, both the 
text.toString() and nonNormalizedText.toString() contain the fully 
expanded contents of the entity (a bunch of <prompt>another 
prompt</prompt> entries).  After I build this into a DOM and 
serialize it, it looks like...

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE vxml SYSTEM "voicexml1-0.dtd"
  [
<!ENTITY %BigEntity SYSTEM "BigEntity.ent">
<!ENTITY BigEntity "
		<prompt>another prompt</prompt>
		<prompt>another prompt</prompt>
		<prompt>another prompt</prompt>
		<prompt>another prompt</prompt>
		<prompt>another prompt</prompt>
		<prompt>another prompt</prompt>
		<prompt>another prompt</prompt>

">
]>
<vxml version="1.0"><form id="init"><block>&BigEntity;</block></form></vxml>

Why are the contents expanded?  Why not just give me "%BigEntity;" 
from the original document?  Is there any recourse?

A second problem is that reparsing this file gives me the following 
parsing error...

test9.gen.xml:4: Error: White space is required after "<!ENTITY" in 
the entity declaration.

The parse error seems to be with the external entity declaration 
"%BigEntity".  If I change that to "% BigEntity", the parse error 
goes away.  But, the "name" parameter passed to me in the 
externalEntityDecl() method is exactly "%BigEntity".  If the parser 
knows it can't deal with that, why doesn't it pass me "% 
BigEntity"?  Seems odd that I'd be forced to split the "%" apart from 
the rest of the entiy name manually.


thanks,

Jake


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org

Re: why is entity ref expanded in the internal subset (and related questions)?...

Posted by Michael Glavassevich <mr...@ca.ibm.com>.

Hi Jacob,

Jacob Kjome <ho...@visi.com> wrote on 04/08/2006 06:09:30 PM:

> Still haven't found a good answer my previous question below, but at 
> least I have it generally working with the expanded Entity and hack 
> to turn "%BigEntity" into "% BigEntity" so that I don't get parse 
> error upon reparsing the serialized document.  However, I've got a 
> larger issue than any of that, and I think it is my last major issue 
> to figure out....
> 
> When I add EntityReference nodes to the DOM I'm building up, they 
> don't get expanded in the parse tree.  That is, when I print out the 
> DOM tree, all I see are the EntityReference nodes, but none of their 
> children, even though I iterate recursively over all the child 
> nodes.  For instance, for the example listed in my previous question 
> (below), here is the parse tree...
> 
> Document MIME type: null
> Document encoding: UTF-8
> DOM hierarchy:
>      XercesLinkedDocument:
>          DocumentTypeImpl: name=vxml
>           internalSubset=
> <!ENTITY % BigEntity SYSTEM "BigEntity.ent">
> <!ENTITY BigEntity "
>                  <prompt>another prompt</prompt>
>                  <prompt>another prompt</prompt>
>                  <prompt>another prompt</prompt>
>                  <prompt>another prompt</prompt>
>                  <prompt>another prompt</prompt>
>                  <prompt>another prompt</prompt>
>                  <prompt>another prompt</prompt>
> ">
> 
>          ElementNSImpl: vxml
>              Attributes:
>                  AttrImpl: version
>                      TextImpl: 1.0
>              ElementImpl: form
>                  Attributes:
>                      AttrImpl: id
>                          TextImpl: init
>                  ElementImpl: block
>                      TextImpl:
> 
>                      EntityReferenceImpl: name=BigEntity
>                      TextImpl:
> 
> 
> Notice that EntityReferenceImpl has no children.  I expected it to 
> include child nodes of the EntityReference; that is, nodes 
> representing the 7 cases of "<prompt>another prompt</prompt>" defined 
> in the Entity "BigEntity".
> 
> I shouldn't have to manually populate the children of the 
> EntityReference, should I?  I have noticed that I get events on the 
> EntityReference contents as I parse the document using XNI, but all I 
> should have to do for the DOM is simply do the following, no?...
> 
>          EntityReference entityRef = 
> fDocument.createEntityReference(entityName);
>          fCurrentNode.appendChild(entityRef);
> 
> ...where "fCurrentNode" is the current parent node to which children 
> are being appended.  I've read that EntityReference children might be 
> lazily expanded, so that if they aren't accessed no work is 
> performed, but I am accessing the EntityReference children in order 
> to print the DOM tree.  Why don't they show up???  Shouldn't the DOM 
> do this for me?  It clearly has all the information it needs.  What 
> am I missing?

An EntityReference created with Document.createEntityReference() will only 
have children if there's a corresponding Entity node and the replacement 
text for the Entity is available. Strictly using the DOM interfaces you 
cannot create Entity nodes or modify them since they're read-only. 
EntityReference nodes and their children are also read-only. If you're 
wondering how Xerces builds these things take a look at AbstractDOMParser 
[2] and EntityReferenceImpl [3] (particularly synchronizeChildren()).

> thanks,
> 
> Jake
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org

[1] 
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#ID-392B75AE
[2] 
http://svn.apache.org/viewcvs.cgi/xerces/java/trunk/src/org/apache/xerces/parsers/AbstractDOMParser.java
[3] 
http://svn.apache.org/viewcvs.cgi/xerces/java/trunk/src/org/apache/xerces/dom/EntityReferenceImpl.java


Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org

Re: why is entity ref expanded in the internal subset (and related questions)?...

Posted by Michael Glavassevich <mr...@ca.ibm.com>.

Hi Jacob,

Jacob Kjome <ho...@visi.com> wrote on 04/08/2006 06:09:30 PM:

> Still haven't found a good answer my previous question below, but at 
> least I have it generally working with the expanded Entity and hack 
> to turn "%BigEntity" into "% BigEntity" so that I don't get parse 
> error upon reparsing the serialized document.  However, I've got a 
> larger issue than any of that, and I think it is my last major issue 
> to figure out....
> 
> When I add EntityReference nodes to the DOM I'm building up, they 
> don't get expanded in the parse tree.  That is, when I print out the 
> DOM tree, all I see are the EntityReference nodes, but none of their 
> children, even though I iterate recursively over all the child 
> nodes.  For instance, for the example listed in my previous question 
> (below), here is the parse tree...
> 
> Document MIME type: null
> Document encoding: UTF-8
> DOM hierarchy:
>      XercesLinkedDocument:
>          DocumentTypeImpl: name=vxml
>           internalSubset=
> <!ENTITY % BigEntity SYSTEM "BigEntity.ent">
> <!ENTITY BigEntity "
>                  <prompt>another prompt</prompt>
>                  <prompt>another prompt</prompt>
>                  <prompt>another prompt</prompt>
>                  <prompt>another prompt</prompt>
>                  <prompt>another prompt</prompt>
>                  <prompt>another prompt</prompt>
>                  <prompt>another prompt</prompt>
> ">
> 
>          ElementNSImpl: vxml
>              Attributes:
>                  AttrImpl: version
>                      TextImpl: 1.0
>              ElementImpl: form
>                  Attributes:
>                      AttrImpl: id
>                          TextImpl: init
>                  ElementImpl: block
>                      TextImpl:
> 
>                      EntityReferenceImpl: name=BigEntity
>                      TextImpl:
> 
> 
> Notice that EntityReferenceImpl has no children.  I expected it to 
> include child nodes of the EntityReference; that is, nodes 
> representing the 7 cases of "<prompt>another prompt</prompt>" defined 
> in the Entity "BigEntity".
> 
> I shouldn't have to manually populate the children of the 
> EntityReference, should I?  I have noticed that I get events on the 
> EntityReference contents as I parse the document using XNI, but all I 
> should have to do for the DOM is simply do the following, no?...
> 
>          EntityReference entityRef = 
> fDocument.createEntityReference(entityName);
>          fCurrentNode.appendChild(entityRef);
> 
> ...where "fCurrentNode" is the current parent node to which children 
> are being appended.  I've read that EntityReference children might be 
> lazily expanded, so that if they aren't accessed no work is 
> performed, but I am accessing the EntityReference children in order 
> to print the DOM tree.  Why don't they show up???  Shouldn't the DOM 
> do this for me?  It clearly has all the information it needs.  What 
> am I missing?

An EntityReference created with Document.createEntityReference() will only 
have children if there's a corresponding Entity node and the replacement 
text for the Entity is available. Strictly using the DOM interfaces you 
cannot create Entity nodes or modify them since they're read-only. 
EntityReference nodes and their children are also read-only. If you're 
wondering how Xerces builds these things take a look at AbstractDOMParser 
[2] and EntityReferenceImpl [3] (particularly synchronizeChildren()).

> thanks,
> 
> Jake
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org

[1] 
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#ID-392B75AE
[2] 
http://svn.apache.org/viewcvs.cgi/xerces/java/trunk/src/org/apache/xerces/parsers/AbstractDOMParser.java
[3] 
http://svn.apache.org/viewcvs.cgi/xerces/java/trunk/src/org/apache/xerces/dom/EntityReferenceImpl.java


Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

Re: why is entity ref expanded in the internal subset (and related questions)?...

Posted by Jacob Kjome <ho...@visi.com>.

Still haven't found a good answer my previous question below, but at 
least I have it generally working with the expanded Entity and hack 
to turn "%BigEntity" into "% BigEntity" so that I don't get parse 
error upon reparsing the serialized document.  However, I've got a 
larger issue than any of that, and I think it is my last major issue 
to figure out....

When I add EntityReference nodes to the DOM I'm building up, they 
don't get expanded in the parse tree.  That is, when I print out the 
DOM tree, all I see are the EntityReference nodes, but none of their 
children, even though I iterate recursively over all the child 
nodes.  For instance, for the example listed in my previous question 
(below), here is the parse tree...

Document MIME type: null
Document encoding: UTF-8
DOM hierarchy:
     XercesLinkedDocument:
         DocumentTypeImpl: name=vxml
          internalSubset=
<!ENTITY % BigEntity SYSTEM "BigEntity.ent">
<!ENTITY BigEntity "
                 <prompt>another prompt</prompt>
                 <prompt>another prompt</prompt>
                 <prompt>another prompt</prompt>
                 <prompt>another prompt</prompt>
                 <prompt>another prompt</prompt>
                 <prompt>another prompt</prompt>
                 <prompt>another prompt</prompt>
">

         ElementNSImpl: vxml
             Attributes:
                 AttrImpl: version
                     TextImpl: 1.0
             ElementImpl: form
                 Attributes:
                     AttrImpl: id
                         TextImpl: init
                 ElementImpl: block
                     TextImpl:

                     EntityReferenceImpl: name=BigEntity
                     TextImpl:


Notice that EntityReferenceImpl has no children.  I expected it to 
include child nodes of the EntityReference; that is, nodes 
representing the 7 cases of "<prompt>another prompt</prompt>" defined 
in the Entity "BigEntity".

I shouldn't have to manually populate the children of the 
EntityReference, should I?  I have noticed that I get events on the 
EntityReference contents as I parse the document using XNI, but all I 
should have to do for the DOM is simply do the following, no?...

         EntityReference entityRef = 
fDocument.createEntityReference(entityName);
         fCurrentNode.appendChild(entityRef);

...where "fCurrentNode" is the current parent node to which children 
are being appended.  I've read that EntityReference children might be 
lazily expanded, so that if they aren't accessed no work is 
performed, but I am accessing the EntityReference children in order 
to print the DOM tree.  Why don't they show up???  Shouldn't the DOM 
do this for me?  It clearly has all the information it needs.  What 
am I missing?


thanks,

Jake

At 12:32 AM 4/8/2006, you wrote:
 >
 >I'm having a problem with the following...
 >
 ><?xml version="1.0" encoding="UTF-8"?>
 ><!DOCTYPE vxml SYSTEM "voicexml1-0.dtd" [
 >       <!ENTITY % BigEntity SYSTEM "BigEntity.ent">
 >       %BigEntity;
 >]>
 ><vxml version="1.0">
 >       <form id="init">
 >               <block>
 >                       &BigEntity;
 >               </block>
 >       </form>
 ></vxml>
 >
 >BigEntity.ent looks something like this...
 >
 ><?xml version="1.0" encoding="UTF-8"?>
 ><!ENTITY BigEntity '
 >               <prompt>another prompt</prompt>
 >               <prompt>another prompt</prompt>
 >               <prompt>another prompt</prompt>
 >               <prompt>another prompt</prompt>
 >               <prompt>another prompt</prompt>
 >               <prompt>another prompt</prompt>
 >               <prompt>another prompt</prompt>
 >'>
 >
 >When internalEntityDecl(String name, XMLString text, XMLString
 >nonNormalizedText, Augmentations augs) gets called, both the
 >text.toString() and nonNormalizedText.toString() contain the fully
 >expanded contents of the entity (a bunch of <prompt>another
 >prompt</prompt> entries).  After I build this into a DOM and
 >serialize it, it looks like...
 >
 ><?xml version="1.0" encoding="UTF-8"?>
 ><!DOCTYPE vxml SYSTEM "voicexml1-0.dtd"
 >  [
 ><!ENTITY %BigEntity SYSTEM "BigEntity.ent">
 ><!ENTITY BigEntity "
 >               <prompt>another prompt</prompt>
 >               <prompt>another prompt</prompt>
 >               <prompt>another prompt</prompt>
 >               <prompt>another prompt</prompt>
 >               <prompt>another prompt</prompt>
 >               <prompt>another prompt</prompt>
 >               <prompt>another prompt</prompt>
 >
 >">
 >]>
 ><vxml version="1.0"><form id="init"><block>&BigEntity;</block></form></vxml>
 >
 >Why are the contents expanded?  Why not just give me "%BigEntity;"
 >from the original document?  Is there any recourse?
 >
 >A second problem is that reparsing this file gives me the following
 >parsing error...
 >
 >test9.gen.xml:4: Error: White space is required after "<!ENTITY" in
 >the entity declaration.
 >
 >The parse error seems to be with the external entity declaration
 >"%BigEntity".  If I change that to "% BigEntity", the parse error
 >goes away.  But, the "name" parameter passed to me in the
 >externalEntityDecl() method is exactly "%BigEntity".  If the parser
 >knows it can't deal with that, why doesn't it pass me "%
 >BigEntity"?  Seems odd that I'd be forced to split the "%" apart from
 >the rest of the entiy name manually.
 >
 >
 >thanks,
 >
 >Jake
 >
 >
 >---------------------------------------------------------------------
 >To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
 >For additional commands, e-mail: general-help@xml.apache.org
 >
 >
 > 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org

Re: why is entity ref expanded in the internal subset (and related questions)?...

Posted by Michael Glavassevich <mr...@ca.ibm.com>.

Hi Jacob,

Jacob Kjome <ho...@visi.com> wrote on 04/08/2006 01:32:28 AM:

<snip/>

> Why are the contents expanded?  Why not just give me "%BigEntity;" 
> from the original document?  Is there any recourse?

startParameterEntity() [1] and endParameterEntity() [2] notify you of the 
start and end of a parameter entity. You could just ignore the events in 
between them and write the entity reference instead of its replacement 
text. 
 
> A second problem is that reparsing this file gives me the following 
> parsing error...
> 
> test9.gen.xml:4: Error: White space is required after "<!ENTITY" in 
> the entity declaration.
> 
> The parse error seems to be with the external entity declaration 
> "%BigEntity".  If I change that to "% BigEntity", the parse error 
> goes away.  But, the "name" parameter passed to me in the 
> externalEntityDecl() method is exactly "%BigEntity".  If the parser 
> knows it can't deal with that, why doesn't it pass me "% 
> BigEntity"?  Seems odd that I'd be forced to split the "%" apart from 
> the rest of the entiy name manually.

SAX requires that parameter entities be reported [3] with '%' prepended to 
their names. XNI adopted this naming convention since that's what SAX 
expects.

> thanks,
> 
> Jake
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org

[1] 
http://xerces.apache.org/xerces2-j/javadocs/xni/org/apache/xerces/xni/XMLDTDHandler.html#startParameterEntity(java.lang.String
, org.apache.xerces.xni.XMLResourceIdentifier, java.lang.String, 
org.apache.xerces.xni.Augmentations)
[2] 
http://xerces.apache.org/xerces2-j/javadocs/xni/org/apache/xerces/xni/XMLDTDHandler.html#endParameterEntity(java.lang.String
, org.apache.xerces.xni.Augmentations)
[3] 
http://xerces.apache.org/xerces2-j/javadocs/api/org/xml/sax/ext/LexicalHandler.html#startEntity(java.lang.String)

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

Re: why is entity ref expanded in the internal subset (and related questions)?...

Posted by Michael Glavassevich <mr...@ca.ibm.com>.

Hi Jacob,

Jacob Kjome <ho...@visi.com> wrote on 04/08/2006 01:32:28 AM:

<snip/>

> Why are the contents expanded?  Why not just give me "%BigEntity;" 
> from the original document?  Is there any recourse?

startParameterEntity() [1] and endParameterEntity() [2] notify you of the 
start and end of a parameter entity. You could just ignore the events in 
between them and write the entity reference instead of its replacement 
text. 
 
> A second problem is that reparsing this file gives me the following 
> parsing error...
> 
> test9.gen.xml:4: Error: White space is required after "<!ENTITY" in 
> the entity declaration.
> 
> The parse error seems to be with the external entity declaration 
> "%BigEntity".  If I change that to "% BigEntity", the parse error 
> goes away.  But, the "name" parameter passed to me in the 
> externalEntityDecl() method is exactly "%BigEntity".  If the parser 
> knows it can't deal with that, why doesn't it pass me "% 
> BigEntity"?  Seems odd that I'd be forced to split the "%" apart from 
> the rest of the entiy name manually.

SAX requires that parameter entities be reported [3] with '%' prepended to 
their names. XNI adopted this naming convention since that's what SAX 
expects.

> thanks,
> 
> Jake
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org

[1] 
http://xerces.apache.org/xerces2-j/javadocs/xni/org/apache/xerces/xni/XMLDTDHandler.html#startParameterEntity(java.lang.String
, org.apache.xerces.xni.XMLResourceIdentifier, java.lang.String, 
org.apache.xerces.xni.Augmentations)
[2] 
http://xerces.apache.org/xerces2-j/javadocs/xni/org/apache/xerces/xni/XMLDTDHandler.html#endParameterEntity(java.lang.String
, org.apache.xerces.xni.Augmentations)
[3] 
http://xerces.apache.org/xerces2-j/javadocs/api/org/xml/sax/ext/LexicalHandler.html#startEntity(java.lang.String)

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org