You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xml.apache.org by Jacob Kjome <ho...@visi.com> on 2006/04/04 05:49:34 UTC

how do I detect internal subset when part of external subset?

If I have a document that looks like this...

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE DIALOG SYSTEM "VoxML.dtd" [
	<!ENTITY ServletURL 'http://127.0.0.1:8080'>
]>
<DIALOG BARGEIN="N">
	<STEP NAME="init" PARENT="init">
		<PROMPT>Greeting for &ServletURL;</PROMPT>
		<INPUT TYPE="HIDDEN" NAME="Action" VALUE="launchEmailApplication"/>
		<INPUT TYPE="NONE" 
NEXT="&ServletURL;/servlet/VoxSurf.Architecture.VoxML.VxsVoxMLApplicationServlet"/>
	</STEP>
</DIALOG>


How do I tell the internal subset apart from the external 
subset?  Normally I'd do...

public void doctypeDecl(String rootElement, String publicId, String 
systemId, Augmentations augs) throws XNIException {
         if (publicId == null && systemId == null) {
             fProcessingState = PROCESSING_INTERNAL_SUBSET;
             fInternalSubset = new StringBuffer();
         } else {
             fProcessingState = PROCESSING_EXTERNAL_SUBSET;
         }
}

However, in this case, the System Id is not null, so I default to the 
extenal subset processing state.  In the other DTD event methods, I 
check which state I'm in and only append to the internal subset 
buffer when I'm in the internal subset processing state.  If I build 
a document from this, the output excludes the <!ENTITY> declaration 
inside what is, really, the internal subset.  Thus, parsing of the 
resulting document fails because the document references 
"&ServletURL;", but the <!ENTITY> declaration does not exist.

So, is there some unique condition I can look for while using the XNI 
parser to determine if I am parsing the internal subset when the 
external and internal subsets are combined?


Jake


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: how do I detect internal subset when part of external subset?

Posted by Jacob Kjome <ho...@visi.com>.
At 05:39 AM 4/7/2006, you wrote:
 >Jacob Kjome wrote:
 >
 >> In any case, I think I've got the internal subset stuff working, except
 >> for one thing.  Take the following document...
 >>
 >> <?xml version="1.0" standalone="no"?>
 >> <!DOCTYPE document SYSTEM "document.dtd" [
 >>   <!ENTITY head SYSTEM "header.xml">
 >>   <!ENTITY foot SYSTEM "footer.xml">
 >>   <!ENTITY torso SYSTEM "body.xml">
 >>   <!ENTITY erh "Elliotte Rusty Harold">
 >> ]>
 >> <document>
 >>   &head; &torso; &foot;
 >> </document>
 >>
 >> The only part of this that ends up in the internal subset is the "erh"
 >> entity.  That is, the internalEntityDecl() method gets called only for
 >> the "erh" entity and is not notified at all for the other entities.
 >
 >
 >The first three are external entity declarations. (i.e. even though
 >they;re in the internal DTD subset they declare external entities.)
 >Perhaps there's an externalEntityDecl method hiding somewhere?
 >

Yep, that's what confused me.  I figured it out shortly after I sent the email.

later,

Jake

 >--
 >Elliotte Rusty Harold  elharo@metalab.unc.edu
 >XML in a Nutshell 3rd Edition Just Published!
 >http://www.cafeconleche.org/books/xian3/
 >http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
 >
 >---------------------------------------------------------------------
 >To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
 >For additional commands, e-mail: general-help@xml.apache.org
 >
 >
 > 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: how do I detect internal subset when part of external subset?

Posted by Elliotte Harold <el...@metalab.unc.edu>.
Jacob Kjome wrote:

> In any case, I think I've got the internal subset stuff working, except 
> for one thing.  Take the following document...
> 
> <?xml version="1.0" standalone="no"?>
> <!DOCTYPE document SYSTEM "document.dtd" [
>   <!ENTITY head SYSTEM "header.xml">
>   <!ENTITY foot SYSTEM "footer.xml">
>   <!ENTITY torso SYSTEM "body.xml">
>   <!ENTITY erh "Elliotte Rusty Harold">
> ]>
> <document>
>   &head; &torso; &foot;
> </document>
> 
> The only part of this that ends up in the internal subset is the "erh" 
> entity.  That is, the internalEntityDecl() method gets called only for 
> the "erh" entity and is not notified at all for the other entities.  


The first three are external entity declarations. (i.e. even though 
they;re in the internal DTD subset they declare external entities.) 
Perhaps there's an externalEntityDecl method hiding somewhere?

-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: how do I detect internal subset when part of external subset?

Posted by Jacob Kjome <ho...@visi.com>.
Hi Michael,

I just figured that out shortly after I sent the 
email.  Just didn't get a chance to reply before 
you sent yours.  Sorry about that.  It always 
seems that I figure it out right after I hit the 
"send" button.  Thanks for the references.

later,

Jake

At 10:32 PM 4/6/2006, you wrote:
 >Hi Jacob,
 >
 ><!ENTITY head SYSTEM "header.xml">
 ><!ENTITY foot SYSTEM "footer.xml">
 ><!ENTITY torso SYSTEM "body.xml">
 >
 >are external entity declarations [1][2]. They are reported by
 >XMLDTDHandler.externalEntityDecl() in XNI and DeclHandler.
 >externalEntityDecl() in SAX.
 >
 >Thanks.
 >
 >[1] http://www.w3.org/TR/2004/REC-xml-20040204/#sec-entity-decl
 >[2] http://www.w3.org/TR/2004/REC-xml-20040204/#sec-external-ent
 >
 >Michael Glavassevich
 >XML Parser Development
 >IBM Toronto Lab
 >E-mail: mrglavas@ca.ibm.com
 >E-mail: mrglavas@apache.org
 >
 >Jacob Kjome <ho...@visi.com> wrote on 04/06/2006 11:07:57 PM:
 >
 >>
 >> Thanks for the tip, Elliotte.  I'll remember it
 >> when I use SAX.  I'm using XNI in this case.  I
 >> suppose I could use SAX, but I'm really just
 >> trying to migrate from Xerces1 to Xerces2 for
 >> XMLC.  XMLC already depends directly on Xerces
 >> because of the custom DOM's XMLC implements.  I
 >> also wanted to change as little as possible.  I
 >> may make more radical changes once I've proven
 >> that I can make things work properly with minimal changes.
 >>
 >> In any case, I think I've got the internal subset
 >> stuff working, except for one thing.  Take the following document...
 >>
 >> <?xml version="1.0" standalone="no"?>
 >> <!DOCTYPE document SYSTEM "document.dtd" [
 >>    <!ENTITY head SYSTEM "header.xml">
 >>    <!ENTITY foot SYSTEM "footer.xml">
 >>    <!ENTITY torso SYSTEM "body.xml">
 >>    <!ENTITY erh "Elliotte Rusty Harold">
 >> ]>
 >> <document>
 >>    &head; &torso; &foot;
 >> </document>
 >>
 >> The only part of this that ends up in the
 >> internal subset is the "erh" entity.  That is,
 >> the internalEntityDecl() method gets called only
 >> for the "erh" entity and is not notified at all
 >> for the other entities.  Then, as I build up the
 >> DOM, I create EntityReference's for "&head;
 >> &torso; &foot;" in the <document>.  Upon
 >> serialization, they end up being there in the
 >> document, but since I was never notified to
 >> create the corresponding <!ENTITY> elements in
 >> the internal subset, re-parsing of the serialized
 >> document fails.  So, how do I get notified about
 >> these so I can get them into the DOM unparsed?  I
 >> want the serialized DOM to look as identical as
 >> possible to the above.  I must be missing something.
 >>
 >>
 >> Jake
 >>
 >>
 >> At 06:41 AM 4/4/2006, you wrote:
 >>  >The trick is to look for the entity name "[dtd]". XOM accomplishes
 >this
 >>  >thusly using pure SAX:
 >>  >
 >>  >
 >>  >     protected boolean inExternalSubset = false;
 >>  >
 >>  >     // We have a problem here. Xerces gets this right,
 >>  >     // but Crimson and possibly other parsers don't properly
 >>  >     // report these entities, or perhaps just not tag them
 >>  >     // with [dtd] like they're supposed to.
 >>  >     public void startEntity(String name) {
 >>  >       if (name.equals("[dtd]")) inExternalSubset = true;
 >>  >     }
 >>  >
 >>  >
 >>  >     public void endEntity(String name) {
 >>  >       if (name.equals("[dtd]")) inExternalSubset = false;
 >>  >     }
 >>  >
 >>  >You can just reverse the logic if you prefer inInternalSubset.
 >>  >
 >>  >--
 >>  >Elliotte Rusty Harold  elharo@metalab.unc.edu
 >>  >XML in a Nutshell 3rd Edition Just Published!
 >>  >http://www.cafeconleche.org/books/xian3/
 >> >http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
 >>  >
 >>  >---------------------------------------------------------------------
 >>  >To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
 >>  >For additional commands, e-mail: general-help@xml.apache.org
 >>  >
 >>  >
 >>  >
 >>
 >>
 >> ---------------------------------------------------------------------
 >> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
 >> For additional commands, e-mail: general-help@xml.apache.org
 >>
 >
 >
 >---------------------------------------------------------------------
 >To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
 >For additional commands, e-mail: general-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: how do I detect internal subset when part of external subset?

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi Jacob,

<!ENTITY head SYSTEM "header.xml">
<!ENTITY foot SYSTEM "footer.xml">
<!ENTITY torso SYSTEM "body.xml">

are external entity declarations [1][2]. They are reported by 
XMLDTDHandler.externalEntityDecl() in XNI and DeclHandler.
externalEntityDecl() in SAX.

Thanks.

[1] http://www.w3.org/TR/2004/REC-xml-20040204/#sec-entity-decl
[2] http://www.w3.org/TR/2004/REC-xml-20040204/#sec-external-ent

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Jacob Kjome <ho...@visi.com> wrote on 04/06/2006 11:07:57 PM:

> 
> Thanks for the tip, Elliotte.  I'll remember it 
> when I use SAX.  I'm using XNI in this case.  I 
> suppose I could use SAX, but I'm really just 
> trying to migrate from Xerces1 to Xerces2 for 
> XMLC.  XMLC already depends directly on Xerces 
> because of the custom DOM's XMLC implements.  I 
> also wanted to change as little as possible.  I 
> may make more radical changes once I've proven 
> that I can make things work properly with minimal changes.
> 
> In any case, I think I've got the internal subset 
> stuff working, except for one thing.  Take the following document...
> 
> <?xml version="1.0" standalone="no"?>
> <!DOCTYPE document SYSTEM "document.dtd" [
>    <!ENTITY head SYSTEM "header.xml">
>    <!ENTITY foot SYSTEM "footer.xml">
>    <!ENTITY torso SYSTEM "body.xml">
>    <!ENTITY erh "Elliotte Rusty Harold">
> ]>
> <document>
>    &head; &torso; &foot;
> </document>
> 
> The only part of this that ends up in the 
> internal subset is the "erh" entity.  That is, 
> the internalEntityDecl() method gets called only 
> for the "erh" entity and is not notified at all 
> for the other entities.  Then, as I build up the 
> DOM, I create EntityReference's for "&head; 
> &torso; &foot;" in the <document>.  Upon 
> serialization, they end up being there in the 
> document, but since I was never notified to 
> create the corresponding <!ENTITY> elements in 
> the internal subset, re-parsing of the serialized 
> document fails.  So, how do I get notified about 
> these so I can get them into the DOM unparsed?  I 
> want the serialized DOM to look as identical as 
> possible to the above.  I must be missing something.
> 
> 
> Jake
> 
> 
> At 06:41 AM 4/4/2006, you wrote:
>  >The trick is to look for the entity name "[dtd]". XOM accomplishes 
this
>  >thusly using pure SAX:
>  >
>  >
>  >     protected boolean inExternalSubset = false;
>  >
>  >     // We have a problem here. Xerces gets this right,
>  >     // but Crimson and possibly other parsers don't properly
>  >     // report these entities, or perhaps just not tag them
>  >     // with [dtd] like they're supposed to.
>  >     public void startEntity(String name) {
>  >       if (name.equals("[dtd]")) inExternalSubset = true;
>  >     }
>  >
>  >
>  >     public void endEntity(String name) {
>  >       if (name.equals("[dtd]")) inExternalSubset = false;
>  >     }
>  >
>  >You can just reverse the logic if you prefer inInternalSubset.
>  >
>  >--
>  >Elliotte Rusty Harold  elharo@metalab.unc.edu
>  >XML in a Nutshell 3rd Edition Just Published!
>  >http://www.cafeconleche.org/books/xian3/
> >http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
>  >
>  >---------------------------------------------------------------------
>  >To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
>  >For additional commands, e-mail: general-help@xml.apache.org
>  >
>  >
>  > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: how do I detect internal subset when part of external subset?

Posted by Jacob Kjome <ho...@visi.com>.
Thanks for the tip, Elliotte.  I'll remember it 
when I use SAX.  I'm using XNI in this case.  I 
suppose I could use SAX, but I'm really just 
trying to migrate from Xerces1 to Xerces2 for 
XMLC.  XMLC already depends directly on Xerces 
because of the custom DOM's XMLC implements.  I 
also wanted to change as little as possible.  I 
may make more radical changes once I've proven 
that I can make things work properly with minimal changes.

In any case, I think I've got the internal subset 
stuff working, except for one thing.  Take the following document...

<?xml version="1.0" standalone="no"?>
<!DOCTYPE document SYSTEM "document.dtd" [
   <!ENTITY head SYSTEM "header.xml">
   <!ENTITY foot SYSTEM "footer.xml">
   <!ENTITY torso SYSTEM "body.xml">
   <!ENTITY erh "Elliotte Rusty Harold">
]>
<document>
   &head; &torso; &foot;
</document>

The only part of this that ends up in the 
internal subset is the "erh" entity.  That is, 
the internalEntityDecl() method gets called only 
for the "erh" entity and is not notified at all 
for the other entities.  Then, as I build up the 
DOM, I create EntityReference's for "&head; 
&torso; &foot;" in the <document>.  Upon 
serialization, they end up being there in the 
document, but since I was never notified to 
create the corresponding <!ENTITY> elements in 
the internal subset, re-parsing of the serialized 
document fails.  So, how do I get notified about 
these so I can get them into the DOM unparsed?  I 
want the serialized DOM to look as identical as 
possible to the above.  I must be missing something.


Jake


At 06:41 AM 4/4/2006, you wrote:
 >The trick is to look for the entity name "[dtd]". XOM accomplishes this
 >thusly using pure SAX:
 >
 >
 >     protected boolean inExternalSubset = false;
 >
 >     // We have a problem here. Xerces gets this right,
 >     // but Crimson and possibly other parsers don't properly
 >     // report these entities, or perhaps just not tag them
 >     // with [dtd] like they're supposed to.
 >     public void startEntity(String name) {
 >       if (name.equals("[dtd]")) inExternalSubset = true;
 >     }
 >
 >
 >     public void endEntity(String name) {
 >       if (name.equals("[dtd]")) inExternalSubset = false;
 >     }
 >
 >You can just reverse the logic if you prefer inInternalSubset.
 >
 >--
 >Elliotte Rusty Harold  elharo@metalab.unc.edu
 >XML in a Nutshell 3rd Edition Just Published!
 >http://www.cafeconleche.org/books/xian3/
 >http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
 >
 >---------------------------------------------------------------------
 >To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
 >For additional commands, e-mail: general-help@xml.apache.org
 >
 >
 > 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: how do I detect internal subset when part of external subset?

Posted by Elliotte Harold <el...@metalab.unc.edu>.
The trick is to look for the entity name "[dtd]". XOM accomplishes this 
thusly using pure SAX:


     protected boolean inExternalSubset = false;

     // We have a problem here. Xerces gets this right,
     // but Crimson and possibly other parsers don't properly
     // report these entities, or perhaps just not tag them
     // with [dtd] like they're supposed to.
     public void startEntity(String name) {
       if (name.equals("[dtd]")) inExternalSubset = true;
     }


     public void endEntity(String name) {
       if (name.equals("[dtd]")) inExternalSubset = false;
     }

You can just reverse the logic if you prefer inInternalSubset.

-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: how do I detect internal subset when part of external subset?

Posted by Jacob Kjome <ho...@visi.com>.
I think I answered my own question.  I am no longer using doctypeDecl 
for subset detection.  I am now doing the following.  It seems to 
work.  Can anyone tell me if this is the best way to do this?...

     public void startDTD(XMLLocator arg0, Augmentations arg1) throws 
XNIException {
         fProcessingState = PROCESSING_INTERNAL_SUBSET; //initially, 
assume internal subset until startExternalSubset() says otherwise
         fInternalSubset = new StringBuffer(); //TODO - is it 
possible to avoid the stringbuffer in the case of no internal subset?
     }

     public void startExternalSubset(XMLResourceIdentifier 
identifier, Augmentations augs) throws XNIException {
         fProcessingState = PROCESSING_EXTERNAL_SUBSET;
     }

     public void endDTD(Augmentations augs) throws XNIException {
         if (fInternalSubset.length() > 0) {
             fDocBuilder.setInternalSubset(fInternalSubset.toString());
         }
         fProcessingState = PROCESSING_DOCUMENT;
     }


Jake

At 10:49 PM 4/3/2006, you wrote:
 >
 >If I have a document that looks like this...
 >
 ><?xml version="1.0" encoding="UTF-8"?>
 ><!DOCTYPE DIALOG SYSTEM "VoxML.dtd" [
 >       <!ENTITY ServletURL 'http://127.0.0.1:8080'>
 >]>
 ><DIALOG BARGEIN="N">
 >       <STEP NAME="init" PARENT="init">
 >               <PROMPT>Greeting for &ServletURL;</PROMPT>
 >               <INPUT TYPE="HIDDEN" NAME="Action" 
VALUE="launchEmailApplication"/>
 >               <INPUT TYPE="NONE"
 >NEXT="&ServletURL;/servlet/VoxSurf.Architecture.VoxML.VxsVoxMLApplicat
 >ionServlet"/>
 >       </STEP>
 ></DIALOG>
 >
 >
 >How do I tell the internal subset apart from the external
 >subset?  Normally I'd do...
 >
 >public void doctypeDecl(String rootElement, String publicId, String
 >systemId, Augmentations augs) throws XNIException {
 >         if (publicId == null && systemId == null) {
 >             fProcessingState = PROCESSING_INTERNAL_SUBSET;
 >             fInternalSubset = new StringBuffer();
 >         } else {
 >             fProcessingState = PROCESSING_EXTERNAL_SUBSET;
 >         }
 >}
 >
 >However, in this case, the System Id is not null, so I default to the
 >extenal subset processing state.  In the other DTD event methods, I
 >check which state I'm in and only append to the internal subset
 >buffer when I'm in the internal subset processing state.  If I build
 >a document from this, the output excludes the <!ENTITY> declaration
 >inside what is, really, the internal subset.  Thus, parsing of the
 >resulting document fails because the document references
 >"&ServletURL;", but the <!ENTITY> declaration does not exist.
 >
 >So, is there some unique condition I can look for while using the XNI
 >parser to determine if I am parsing the internal subset when the
 >external and internal subsets are combined?
 >
 >
 >Jake
 >
 >
 >---------------------------------------------------------------------
 >To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
 >For additional commands, e-mail: general-help@xml.apache.org
 >
 >
 > 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org