You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by "Smith, Brook non Unisys" <Br...@unisys.com> on 2008/05/19 00:20:55 UTC

Processing instrunction outside the root element

Hi,

 

We have a intermittent issue where occasionally, a document with a
processing instruction outside the root element (at the very end of the
document) causes a parsing exception 

"org.xml.sax.SAXException: XML document structures must start and end
within the same entity."
 
If the processing instruction has white space after it will parse OK.
 
Any help you can give would be appreciated.
 
Thanks, Brook.

 


Re: Processing instrunction outside the root element

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
... and the likely bug is this [1]. Fixed in October 2004.

[1] http://issues.apache.org/jira/browse/XERCESJ-1016

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Michael Glavassevich/Toronto/IBM@IBMCA wrote on 05/19/2008 12:25:57 PM:

> keshlam@us.ibm.com wrote on 05/19/2008 11:53:06 AM:
>
> > I agree that it sounds like Xerces should
> > always be delivering this late PI to the application. Can we come up
> > with small testcase that demonstrates a failure to do so?
>
> I vaguely remember fixing a problem like that years ago so it might
> only be reproducible with some old version of Xerces-J. The user
> didn't state what they were using. It might not even be an Apache
> version. We keep getting more and more bug reports for the Sun JDK
> fork of the codebase which has bugs which were fixed here long ago
> and some which never existed here in the first place.
>
> > ______________________________________
> > "... Three things see no end: A loop with exit code done wrong,
> > A semaphore untested, And the change that comes along. ..."
> >  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (http://www.ovff.
> > org/pegasus/songs/threes-rev-11.html)
>
> Thanks.
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org

Re: Processing instrunction outside the root element

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
keshlam@us.ibm.com wrote on 05/19/2008 11:53:06 AM:

> I agree that it sounds like Xerces should
> always be delivering this late PI to the application. Can we come up
> with small testcase that demonstrates a failure to do so?

I vaguely remember fixing a problem like that years ago so it might only be
reproducible with some old version of Xerces-J. The user didn't state what
they were using. It might not even be an Apache version. We keep getting
more and more bug reports for the Sun JDK fork of the codebase which has
bugs which were fixed here long ago and some which never existed here in
the first place.

> ______________________________________
> "... Three things see no end: A loop with exit code done wrong,
> A semaphore untested, And the change that comes along. ..."
>  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (http://www.ovff.
> org/pegasus/songs/threes-rev-11.html)

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Re: Processing instrunction outside the root element

Posted by Eliot Kimber <ek...@reallysi.com>.
On 5/19/08 10:53 AM, "keshlam@us.ibm.com" <ke...@us.ibm.com> wrote:

>> That is not true. The definition of document in the XML 1.1 spec is:
>>       (  prolog  element  Misc*  )
> 
> Hmmm. You're right; my error. That's true even in 1.0.
> 
> 
> Tim Bray, in his Annotated XML Specification, said:
> 
> "The fact that you're allowed some trailing junk after the root element, I
> decided (but unfortunately too late) is a real design error in XML. If I'm
> writing a network client, I'm probably going to close the link as soon as
> a I see the root element end-tag, and not depend on the other end closing
> it down properly.
> "Furthermore, if I want to send a succession of XML documents over a
> network link, if I find a processing instruction after a root element, is
> it a trailer on the previous document, or part of the prolog of the next?"

I would think that you'd pretty much have to assume that any PI following a
root element is part of the element's document since you would expect an XML
declaration to precede any PIs associated with any following document. That
reflects the fact that an XML declaration in a stream of concatenated
strings to be interpreted as XML documents is an unambiguous signal of
document boundaries.

If you want to transmit XML documents as sequences of characters and enable
multiple docs in a single literal string, you need to define a convention
for signaling their boundaries, which means either requiring the using of
XML declarations or define some other signal as part of your protocol. It's
not the XML standard's place to define network protocols.

And if you forget that the atomic unit of data in XML is the *document* and
not the *element* then you will come to grief, as Tim suggests above.

Cheers,

Eliot

-- 
Eliot Kimber
Senior Solutions Architect
"Bringing Strategy, Content, and Technology Together"
Main: 610.631.6770
www.reallysi.com
www.rsuitecms.com



---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: Processing instrunction outside the root element

Posted by ke...@us.ibm.com.
>That is not true. The definition of document in the XML 1.1 spec is:
>       (  prolog  element  Misc*  )

Hmmm. You're right; my error. That's true even in 1.0.


Tim Bray, in his Annotated XML Specification, said:

"The fact that you're allowed some trailing junk after the root element, I 
decided (but unfortunately too late) is a real design error in XML. If I'm 
writing a network client, I'm probably going to close the link as soon as 
a I see the root element end-tag, and not depend on the other end closing 
it down properly.
"Furthermore, if I want to send a succession of XML documents over a 
network link, if I find a processing instruction after a root element, is 
it a trailer on the previous document, or part of the prolog of the next?"

I can quibble with those -- the spec says you _can't_ reliably do the 
active shut-down nor concatenate multiple XML documents into a stream 
without running into the Misc* problem, so this may be a case of "if it 
hurts when you do that, don't do that." But I do think Best Practice is 
probably to avoid relying on PIs after the document unless you have 
complete control of the code on both ends of the wire.

Having said all that... I agree that it sounds like Xerces should always 
be delivering this late PI to the application. Can we come up with small 
testcase that demonstrates a failure to do so?


______________________________________
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
http://www.ovff.org/pegasus/songs/threes-rev-11.html)

Re: Processing instrunction outside the root element

Posted by Eliot Kimber <ek...@reallysi.com>.
On 5/19/08 8:22 AM, "keshlam@us.ibm.com" <ke...@us.ibm.com> wrote:

>> processing instruction outside the root element (at the very end of the
> document) 
> 
> By XML's grammar rules, nothing meaningful may follow the root element.
> That includes PIs.  Any tool which is processing that PI is actually
> behaving incorrectly.

That is not true. The definition of document in the XML 1.1 spec is:

       (  prolog  element  Misc*  )

Where "Misc" is:

     Comment | PI | S

So PIs and comments are absolutely allowed to follow the root element, just
as they are allowed to precede it.

This document is fine:

<?xml version="1.0"?>
<?mypi foo?>
<x/>
<?mypi bar?>

Cheers,

Eliot

-- 
Eliot Kimber
Senior Solutions Architect
"Bringing Strategy, Content, and Technology Together"
Main: 610.631.6770
www.reallysi.com
www.rsuitecms.com



---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: Processing instrunction outside the root element

Posted by ke...@us.ibm.com.
> processing instruction outside the root element (at the very end of the 
document) 

By XML's grammar rules, nothing meaningful may follow the root element. 
That includes PIs.  Any tool which is processing that PI is actually 
behaving incorrectly.

Fix your document design?

______________________________________
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
http://www.ovff.org/pegasus/songs/threes-rev-11.html)