You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xml.apache.org by Jacob Kjome <ho...@visi.com> on 2006/04/07 06:33:53 UTC

Re: how do I detect internal subset when part of external subset?

Hi Michael,

I just figured that out shortly after I sent the 
email.  Just didn't get a chance to reply before 
you sent yours.  Sorry about that.  It always 
seems that I figure it out right after I hit the 
"send" button.  Thanks for the references.

later,

Jake

At 10:32 PM 4/6/2006, you wrote:
 >Hi Jacob,
 >
 ><!ENTITY head SYSTEM "header.xml">
 ><!ENTITY foot SYSTEM "footer.xml">
 ><!ENTITY torso SYSTEM "body.xml">
 >
 >are external entity declarations [1][2]. They are reported by
 >XMLDTDHandler.externalEntityDecl() in XNI and DeclHandler.
 >externalEntityDecl() in SAX.
 >
 >Thanks.
 >
 >[1] http://www.w3.org/TR/2004/REC-xml-20040204/#sec-entity-decl
 >[2] http://www.w3.org/TR/2004/REC-xml-20040204/#sec-external-ent
 >
 >Michael Glavassevich
 >XML Parser Development
 >IBM Toronto Lab
 >E-mail: mrglavas@ca.ibm.com
 >E-mail: mrglavas@apache.org
 >
 >Jacob Kjome <ho...@visi.com> wrote on 04/06/2006 11:07:57 PM:
 >
 >>
 >> Thanks for the tip, Elliotte.  I'll remember it
 >> when I use SAX.  I'm using XNI in this case.  I
 >> suppose I could use SAX, but I'm really just
 >> trying to migrate from Xerces1 to Xerces2 for
 >> XMLC.  XMLC already depends directly on Xerces
 >> because of the custom DOM's XMLC implements.  I
 >> also wanted to change as little as possible.  I
 >> may make more radical changes once I've proven
 >> that I can make things work properly with minimal changes.
 >>
 >> In any case, I think I've got the internal subset
 >> stuff working, except for one thing.  Take the following document...
 >>
 >> <?xml version="1.0" standalone="no"?>
 >> <!DOCTYPE document SYSTEM "document.dtd" [
 >>    <!ENTITY head SYSTEM "header.xml">
 >>    <!ENTITY foot SYSTEM "footer.xml">
 >>    <!ENTITY torso SYSTEM "body.xml">
 >>    <!ENTITY erh "Elliotte Rusty Harold">
 >> ]>
 >> <document>
 >>    &head; &torso; &foot;
 >> </document>
 >>
 >> The only part of this that ends up in the
 >> internal subset is the "erh" entity.  That is,
 >> the internalEntityDecl() method gets called only
 >> for the "erh" entity and is not notified at all
 >> for the other entities.  Then, as I build up the
 >> DOM, I create EntityReference's for "&head;
 >> &torso; &foot;" in the <document>.  Upon
 >> serialization, they end up being there in the
 >> document, but since I was never notified to
 >> create the corresponding <!ENTITY> elements in
 >> the internal subset, re-parsing of the serialized
 >> document fails.  So, how do I get notified about
 >> these so I can get them into the DOM unparsed?  I
 >> want the serialized DOM to look as identical as
 >> possible to the above.  I must be missing something.
 >>
 >>
 >> Jake
 >>
 >>
 >> At 06:41 AM 4/4/2006, you wrote:
 >>  >The trick is to look for the entity name "[dtd]". XOM accomplishes
 >this
 >>  >thusly using pure SAX:
 >>  >
 >>  >
 >>  >     protected boolean inExternalSubset = false;
 >>  >
 >>  >     // We have a problem here. Xerces gets this right,
 >>  >     // but Crimson and possibly other parsers don't properly
 >>  >     // report these entities, or perhaps just not tag them
 >>  >     // with [dtd] like they're supposed to.
 >>  >     public void startEntity(String name) {
 >>  >       if (name.equals("[dtd]")) inExternalSubset = true;
 >>  >     }
 >>  >
 >>  >
 >>  >     public void endEntity(String name) {
 >>  >       if (name.equals("[dtd]")) inExternalSubset = false;
 >>  >     }
 >>  >
 >>  >You can just reverse the logic if you prefer inInternalSubset.
 >>  >
 >>  >--
 >>  >Elliotte Rusty Harold  elharo@metalab.unc.edu
 >>  >XML in a Nutshell 3rd Edition Just Published!
 >>  >http://www.cafeconleche.org/books/xian3/
 >> >http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
 >>  >
 >>  >---------------------------------------------------------------------
 >>  >To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
 >>  >For additional commands, e-mail: general-help@xml.apache.org
 >>  >
 >>  >
 >>  >
 >>
 >>
 >> ---------------------------------------------------------------------
 >> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
 >> For additional commands, e-mail: general-help@xml.apache.org
 >>
 >
 >
 >---------------------------------------------------------------------
 >To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
 >For additional commands, e-mail: general-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org