You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xml.apache.org by Jacob Kjome <ho...@visi.com> on 2006/04/04 05:49:34 UTC
how do I detect internal subset when part of external subset?
If I have a document that looks like this...
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE DIALOG SYSTEM "VoxML.dtd" [
<!ENTITY ServletURL 'http://127.0.0.1:8080'>
]>
<DIALOG BARGEIN="N">
<STEP NAME="init" PARENT="init">
<PROMPT>Greeting for &ServletURL;</PROMPT>
<INPUT TYPE="HIDDEN" NAME="Action" VALUE="launchEmailApplication"/>
<INPUT TYPE="NONE"
NEXT="&ServletURL;/servlet/VoxSurf.Architecture.VoxML.VxsVoxMLApplicationServlet"/>
</STEP>
</DIALOG>
How do I tell the internal subset apart from the external
subset? Normally I'd do...
public void doctypeDecl(String rootElement, String publicId, String
systemId, Augmentations augs) throws XNIException {
if (publicId == null && systemId == null) {
fProcessingState = PROCESSING_INTERNAL_SUBSET;
fInternalSubset = new StringBuffer();
} else {
fProcessingState = PROCESSING_EXTERNAL_SUBSET;
}
}
However, in this case, the System Id is not null, so I default to the
extenal subset processing state. In the other DTD event methods, I
check which state I'm in and only append to the internal subset
buffer when I'm in the internal subset processing state. If I build
a document from this, the output excludes the <!ENTITY> declaration
inside what is, really, the internal subset. Thus, parsing of the
resulting document fails because the document references
"&ServletURL;", but the <!ENTITY> declaration does not exist.
So, is there some unique condition I can look for while using the XNI
parser to determine if I am parsing the internal subset when the
external and internal subsets are combined?
Jake
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org
Re: how do I detect internal subset when part of external
subset?
Posted by Jacob Kjome <ho...@visi.com>.
At 05:39 AM 4/7/2006, you wrote:
>Jacob Kjome wrote:
>
>> In any case, I think I've got the internal subset stuff working, except
>> for one thing. Take the following document...
>>
>> <?xml version="1.0" standalone="no"?>
>> <!DOCTYPE document SYSTEM "document.dtd" [
>> <!ENTITY head SYSTEM "header.xml">
>> <!ENTITY foot SYSTEM "footer.xml">
>> <!ENTITY torso SYSTEM "body.xml">
>> <!ENTITY erh "Elliotte Rusty Harold">
>> ]>
>> <document>
>> &head; &torso; &foot;
>> </document>
>>
>> The only part of this that ends up in the internal subset is the "erh"
>> entity. That is, the internalEntityDecl() method gets called only for
>> the "erh" entity and is not notified at all for the other entities.
>
>
>The first three are external entity declarations. (i.e. even though
>they;re in the internal DTD subset they declare external entities.)
>Perhaps there's an externalEntityDecl method hiding somewhere?
>
Yep, that's what confused me. I figured it out shortly after I sent the email.
later,
Jake
>--
>Elliotte Rusty Harold elharo@metalab.unc.edu
>XML in a Nutshell 3rd Edition Just Published!
>http://www.cafeconleche.org/books/xian3/
>http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
>For additional commands, e-mail: general-help@xml.apache.org
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org
Re: how do I detect internal subset when part of external subset?
Posted by Elliotte Harold <el...@metalab.unc.edu>.
Jacob Kjome wrote:
> In any case, I think I've got the internal subset stuff working, except
> for one thing. Take the following document...
>
> <?xml version="1.0" standalone="no"?>
> <!DOCTYPE document SYSTEM "document.dtd" [
> <!ENTITY head SYSTEM "header.xml">
> <!ENTITY foot SYSTEM "footer.xml">
> <!ENTITY torso SYSTEM "body.xml">
> <!ENTITY erh "Elliotte Rusty Harold">
> ]>
> <document>
> &head; &torso; &foot;
> </document>
>
> The only part of this that ends up in the internal subset is the "erh"
> entity. That is, the internalEntityDecl() method gets called only for
> the "erh" entity and is not notified at all for the other entities.
The first three are external entity declarations. (i.e. even though
they;re in the internal DTD subset they declare external entities.)
Perhaps there's an externalEntityDecl method hiding somewhere?
--
Elliotte Rusty Harold elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org
Re: how do I detect internal subset when part of external
subset?
Posted by Jacob Kjome <ho...@visi.com>.
Hi Michael,
I just figured that out shortly after I sent the
email. Just didn't get a chance to reply before
you sent yours. Sorry about that. It always
seems that I figure it out right after I hit the
"send" button. Thanks for the references.
later,
Jake
At 10:32 PM 4/6/2006, you wrote:
>Hi Jacob,
>
><!ENTITY head SYSTEM "header.xml">
><!ENTITY foot SYSTEM "footer.xml">
><!ENTITY torso SYSTEM "body.xml">
>
>are external entity declarations [1][2]. They are reported by
>XMLDTDHandler.externalEntityDecl() in XNI and DeclHandler.
>externalEntityDecl() in SAX.
>
>Thanks.
>
>[1] http://www.w3.org/TR/2004/REC-xml-20040204/#sec-entity-decl
>[2] http://www.w3.org/TR/2004/REC-xml-20040204/#sec-external-ent
>
>Michael Glavassevich
>XML Parser Development
>IBM Toronto Lab
>E-mail: mrglavas@ca.ibm.com
>E-mail: mrglavas@apache.org
>
>Jacob Kjome <ho...@visi.com> wrote on 04/06/2006 11:07:57 PM:
>
>>
>> Thanks for the tip, Elliotte. I'll remember it
>> when I use SAX. I'm using XNI in this case. I
>> suppose I could use SAX, but I'm really just
>> trying to migrate from Xerces1 to Xerces2 for
>> XMLC. XMLC already depends directly on Xerces
>> because of the custom DOM's XMLC implements. I
>> also wanted to change as little as possible. I
>> may make more radical changes once I've proven
>> that I can make things work properly with minimal changes.
>>
>> In any case, I think I've got the internal subset
>> stuff working, except for one thing. Take the following document...
>>
>> <?xml version="1.0" standalone="no"?>
>> <!DOCTYPE document SYSTEM "document.dtd" [
>> <!ENTITY head SYSTEM "header.xml">
>> <!ENTITY foot SYSTEM "footer.xml">
>> <!ENTITY torso SYSTEM "body.xml">
>> <!ENTITY erh "Elliotte Rusty Harold">
>> ]>
>> <document>
>> &head; &torso; &foot;
>> </document>
>>
>> The only part of this that ends up in the
>> internal subset is the "erh" entity. That is,
>> the internalEntityDecl() method gets called only
>> for the "erh" entity and is not notified at all
>> for the other entities. Then, as I build up the
>> DOM, I create EntityReference's for "&head;
>> &torso; &foot;" in the <document>. Upon
>> serialization, they end up being there in the
>> document, but since I was never notified to
>> create the corresponding <!ENTITY> elements in
>> the internal subset, re-parsing of the serialized
>> document fails. So, how do I get notified about
>> these so I can get them into the DOM unparsed? I
>> want the serialized DOM to look as identical as
>> possible to the above. I must be missing something.
>>
>>
>> Jake
>>
>>
>> At 06:41 AM 4/4/2006, you wrote:
>> >The trick is to look for the entity name "[dtd]". XOM accomplishes
>this
>> >thusly using pure SAX:
>> >
>> >
>> > protected boolean inExternalSubset = false;
>> >
>> > // We have a problem here. Xerces gets this right,
>> > // but Crimson and possibly other parsers don't properly
>> > // report these entities, or perhaps just not tag them
>> > // with [dtd] like they're supposed to.
>> > public void startEntity(String name) {
>> > if (name.equals("[dtd]")) inExternalSubset = true;
>> > }
>> >
>> >
>> > public void endEntity(String name) {
>> > if (name.equals("[dtd]")) inExternalSubset = false;
>> > }
>> >
>> >You can just reverse the logic if you prefer inInternalSubset.
>> >
>> >--
>> >Elliotte Rusty Harold elharo@metalab.unc.edu
>> >XML in a Nutshell 3rd Edition Just Published!
>> >http://www.cafeconleche.org/books/xian3/
>> >http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
>> >
>> >---------------------------------------------------------------------
>> >To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
>> >For additional commands, e-mail: general-help@xml.apache.org
>> >
>> >
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
>> For additional commands, e-mail: general-help@xml.apache.org
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
>For additional commands, e-mail: general-help@xml.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org
Re: how do I detect internal subset when part of external subset?
Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi Jacob,
<!ENTITY head SYSTEM "header.xml">
<!ENTITY foot SYSTEM "footer.xml">
<!ENTITY torso SYSTEM "body.xml">
are external entity declarations [1][2]. They are reported by
XMLDTDHandler.externalEntityDecl() in XNI and DeclHandler.
externalEntityDecl() in SAX.
Thanks.
[1] http://www.w3.org/TR/2004/REC-xml-20040204/#sec-entity-decl
[2] http://www.w3.org/TR/2004/REC-xml-20040204/#sec-external-ent
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org
Jacob Kjome <ho...@visi.com> wrote on 04/06/2006 11:07:57 PM:
>
> Thanks for the tip, Elliotte. I'll remember it
> when I use SAX. I'm using XNI in this case. I
> suppose I could use SAX, but I'm really just
> trying to migrate from Xerces1 to Xerces2 for
> XMLC. XMLC already depends directly on Xerces
> because of the custom DOM's XMLC implements. I
> also wanted to change as little as possible. I
> may make more radical changes once I've proven
> that I can make things work properly with minimal changes.
>
> In any case, I think I've got the internal subset
> stuff working, except for one thing. Take the following document...
>
> <?xml version="1.0" standalone="no"?>
> <!DOCTYPE document SYSTEM "document.dtd" [
> <!ENTITY head SYSTEM "header.xml">
> <!ENTITY foot SYSTEM "footer.xml">
> <!ENTITY torso SYSTEM "body.xml">
> <!ENTITY erh "Elliotte Rusty Harold">
> ]>
> <document>
> &head; &torso; &foot;
> </document>
>
> The only part of this that ends up in the
> internal subset is the "erh" entity. That is,
> the internalEntityDecl() method gets called only
> for the "erh" entity and is not notified at all
> for the other entities. Then, as I build up the
> DOM, I create EntityReference's for "&head;
> &torso; &foot;" in the <document>. Upon
> serialization, they end up being there in the
> document, but since I was never notified to
> create the corresponding <!ENTITY> elements in
> the internal subset, re-parsing of the serialized
> document fails. So, how do I get notified about
> these so I can get them into the DOM unparsed? I
> want the serialized DOM to look as identical as
> possible to the above. I must be missing something.
>
>
> Jake
>
>
> At 06:41 AM 4/4/2006, you wrote:
> >The trick is to look for the entity name "[dtd]". XOM accomplishes
this
> >thusly using pure SAX:
> >
> >
> > protected boolean inExternalSubset = false;
> >
> > // We have a problem here. Xerces gets this right,
> > // but Crimson and possibly other parsers don't properly
> > // report these entities, or perhaps just not tag them
> > // with [dtd] like they're supposed to.
> > public void startEntity(String name) {
> > if (name.equals("[dtd]")) inExternalSubset = true;
> > }
> >
> >
> > public void endEntity(String name) {
> > if (name.equals("[dtd]")) inExternalSubset = false;
> > }
> >
> >You can just reverse the logic if you prefer inInternalSubset.
> >
> >--
> >Elliotte Rusty Harold elharo@metalab.unc.edu
> >XML in a Nutshell 3rd Edition Just Published!
> >http://www.cafeconleche.org/books/xian3/
> >http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
> >For additional commands, e-mail: general-help@xml.apache.org
> >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org
Re: how do I detect internal subset when part of external
subset?
Posted by Jacob Kjome <ho...@visi.com>.
Thanks for the tip, Elliotte. I'll remember it
when I use SAX. I'm using XNI in this case. I
suppose I could use SAX, but I'm really just
trying to migrate from Xerces1 to Xerces2 for
XMLC. XMLC already depends directly on Xerces
because of the custom DOM's XMLC implements. I
also wanted to change as little as possible. I
may make more radical changes once I've proven
that I can make things work properly with minimal changes.
In any case, I think I've got the internal subset
stuff working, except for one thing. Take the following document...
<?xml version="1.0" standalone="no"?>
<!DOCTYPE document SYSTEM "document.dtd" [
<!ENTITY head SYSTEM "header.xml">
<!ENTITY foot SYSTEM "footer.xml">
<!ENTITY torso SYSTEM "body.xml">
<!ENTITY erh "Elliotte Rusty Harold">
]>
<document>
&head; &torso; &foot;
</document>
The only part of this that ends up in the
internal subset is the "erh" entity. That is,
the internalEntityDecl() method gets called only
for the "erh" entity and is not notified at all
for the other entities. Then, as I build up the
DOM, I create EntityReference's for "&head;
&torso; &foot;" in the <document>. Upon
serialization, they end up being there in the
document, but since I was never notified to
create the corresponding <!ENTITY> elements in
the internal subset, re-parsing of the serialized
document fails. So, how do I get notified about
these so I can get them into the DOM unparsed? I
want the serialized DOM to look as identical as
possible to the above. I must be missing something.
Jake
At 06:41 AM 4/4/2006, you wrote:
>The trick is to look for the entity name "[dtd]". XOM accomplishes this
>thusly using pure SAX:
>
>
> protected boolean inExternalSubset = false;
>
> // We have a problem here. Xerces gets this right,
> // but Crimson and possibly other parsers don't properly
> // report these entities, or perhaps just not tag them
> // with [dtd] like they're supposed to.
> public void startEntity(String name) {
> if (name.equals("[dtd]")) inExternalSubset = true;
> }
>
>
> public void endEntity(String name) {
> if (name.equals("[dtd]")) inExternalSubset = false;
> }
>
>You can just reverse the logic if you prefer inInternalSubset.
>
>--
>Elliotte Rusty Harold elharo@metalab.unc.edu
>XML in a Nutshell 3rd Edition Just Published!
>http://www.cafeconleche.org/books/xian3/
>http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
>For additional commands, e-mail: general-help@xml.apache.org
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org
Re: how do I detect internal subset when part of external subset?
Posted by Elliotte Harold <el...@metalab.unc.edu>.
The trick is to look for the entity name "[dtd]". XOM accomplishes this
thusly using pure SAX:
protected boolean inExternalSubset = false;
// We have a problem here. Xerces gets this right,
// but Crimson and possibly other parsers don't properly
// report these entities, or perhaps just not tag them
// with [dtd] like they're supposed to.
public void startEntity(String name) {
if (name.equals("[dtd]")) inExternalSubset = true;
}
public void endEntity(String name) {
if (name.equals("[dtd]")) inExternalSubset = false;
}
You can just reverse the logic if you prefer inInternalSubset.
--
Elliotte Rusty Harold elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org
Re: how do I detect internal subset when part of external
subset?
Posted by Jacob Kjome <ho...@visi.com>.
I think I answered my own question. I am no longer using doctypeDecl
for subset detection. I am now doing the following. It seems to
work. Can anyone tell me if this is the best way to do this?...
public void startDTD(XMLLocator arg0, Augmentations arg1) throws
XNIException {
fProcessingState = PROCESSING_INTERNAL_SUBSET; //initially,
assume internal subset until startExternalSubset() says otherwise
fInternalSubset = new StringBuffer(); //TODO - is it
possible to avoid the stringbuffer in the case of no internal subset?
}
public void startExternalSubset(XMLResourceIdentifier
identifier, Augmentations augs) throws XNIException {
fProcessingState = PROCESSING_EXTERNAL_SUBSET;
}
public void endDTD(Augmentations augs) throws XNIException {
if (fInternalSubset.length() > 0) {
fDocBuilder.setInternalSubset(fInternalSubset.toString());
}
fProcessingState = PROCESSING_DOCUMENT;
}
Jake
At 10:49 PM 4/3/2006, you wrote:
>
>If I have a document that looks like this...
>
><?xml version="1.0" encoding="UTF-8"?>
><!DOCTYPE DIALOG SYSTEM "VoxML.dtd" [
> <!ENTITY ServletURL 'http://127.0.0.1:8080'>
>]>
><DIALOG BARGEIN="N">
> <STEP NAME="init" PARENT="init">
> <PROMPT>Greeting for &ServletURL;</PROMPT>
> <INPUT TYPE="HIDDEN" NAME="Action"
VALUE="launchEmailApplication"/>
> <INPUT TYPE="NONE"
>NEXT="&ServletURL;/servlet/VoxSurf.Architecture.VoxML.VxsVoxMLApplicat
>ionServlet"/>
> </STEP>
></DIALOG>
>
>
>How do I tell the internal subset apart from the external
>subset? Normally I'd do...
>
>public void doctypeDecl(String rootElement, String publicId, String
>systemId, Augmentations augs) throws XNIException {
> if (publicId == null && systemId == null) {
> fProcessingState = PROCESSING_INTERNAL_SUBSET;
> fInternalSubset = new StringBuffer();
> } else {
> fProcessingState = PROCESSING_EXTERNAL_SUBSET;
> }
>}
>
>However, in this case, the System Id is not null, so I default to the
>extenal subset processing state. In the other DTD event methods, I
>check which state I'm in and only append to the internal subset
>buffer when I'm in the internal subset processing state. If I build
>a document from this, the output excludes the <!ENTITY> declaration
>inside what is, really, the internal subset. Thus, parsing of the
>resulting document fails because the document references
>"&ServletURL;", but the <!ENTITY> declaration does not exist.
>
>So, is there some unique condition I can look for while using the XNI
>parser to determine if I am parsing the internal subset when the
>external and internal subsets are combined?
>
>
>Jake
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
>For additional commands, e-mail: general-help@xml.apache.org
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org