You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by "Hamilton, Kenneth" <kh...@fdr.follett.com> on 2009/11/09 21:05:45 UTC

trouble parsing a particular XML file

Hello.

(My first post to this list.  Please excuse any breech in manners on my
part.)

I have been trying to parse an XML file which is part of the IMS Common
Cartridge conformance suite.  The file encodes QTI elements and runs
about 129,000 lines of text.  In my environment, which is
Java/JBoss/Xerces, the Xerces parser fails with the message:

[Fatal Error] :129176:1: Content is not allowed in trailing section.

Nevertheless, Firefox and IE are able to parse it well enough to display
the tree structure.  Thinking it might be size-related, I edited the
file down by half, making sure the resulting file was well-formed, and
re-tried.  (The file was pretty flat, about 1500 QTI items, so it was
easy to do.)  That gave me:

[Fatal Error] :59603:3: The markup in the document following the root
element must be well-formed.

(Again Firefox and IE have no trouble.)

Halving again (resulting in the file I attach here) I get:

[Fatal Error] :29761:1: Content is not allowed in trailing section.

(Again, no problem for FF or IE.)

The next halving solves the problem.  Each half separately passes
through the parser.

I have not been able to figure out what my problem is.  I doubled my JVM
memory allocation, but this had no apparent effect.

Can anyone see what is going wrong here?

thanks!

Ken Hamilton

RE: trouble parsing a particular XML file

Posted by "Hamilton, Kenneth" <kh...@fdr.follett.com>.
Thanks, Michael, for the quick reply.  Your result narrows our
search for causes.  We'll try to isolate the cause further and
will report back here if we have some insight to share.

> -----Original Message-----
> From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com] 
> Sent: Monday, November 09, 2009 3:34 PM
> To: j-users@xerces.apache.org
> Subject: Re: trouble parsing a particular XML file
> 
> Hi Ken,
> 
> Seems fine to me. I cannot reproduce the error(s) you're 
> seeing. Running it through the sax.Counter sample it 
> completes normally:
> 
> java sax.Counter file:///D:/xmldocs/objbank_b.xml
> file:///D:/xmldocs/objbank_b.xml: 375 ms (18340 elems, 10035 
> attrs, 0 spaces, 735650 chars)
> 
> so I imagine there's something else going on here, for 
> example some InputStream which is corrupting the document 
> before the parser sees it or you're using some old version of 
> Xerces that had a bug or the JDK fork (for which I keep 
> hearing about issues that Apache Xerces has never had).
> 
> Thanks.
> 
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
> 
> "Hamilton, Kenneth" <kh...@fdr.follett.com> wrote on 
> 11/09/2009 03:05:45 PM:
> 
> > Hello.
> > 
> > (My first post to this list.  Please excuse any breech in 
> manners on 
> > my
> > part.)
> > 
> > I have been trying to parse an XML file which is part of the IMS 
> > Common Cartridge conformance suite.  The file encodes QTI 
> elements and 
> > runs about 129,000 lines of text.  In my environment, which is 
> > Java/JBoss/Xerces, the Xerces parser fails with the message:
> > 
> > [Fatal Error] :129176:1: Content is not allowed in trailing section.
> > 
> > Nevertheless, Firefox and IE are able to parse it well enough to 
> > display the tree structure.  Thinking it might be size-related, I 
> > edited the file down by half, making sure the resulting file was 
> > well-formed, and re-tried.  (The file was pretty flat, 
> about 1500 QTI 
> > items, so it was easy to do.)  That gave me:
> > 
> > [Fatal Error] :59603:3: The markup in the document 
> following the root 
> > element must be well-formed.
> > 
> > (Again Firefox and IE have no trouble.)
> > 
> > Halving again (resulting in the file I attach here) I get:
> > 
> > [Fatal Error] :29761:1: Content is not allowed in trailing section.
> > 
> > (Again, no problem for FF or IE.)
> > 
> > The next halving solves the problem.  Each half separately passes 
> > through the parser.
> > 
> > I have not been able to figure out what my problem is.  I 
> doubled my 
> > JVM memory allocation, but this had no apparent effect.
> > 
> > Can anyone see what is going wrong here?
> > 
> > thanks!
> > 
> > Ken Hamilton
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: j-users-help@xerces.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: trouble parsing a particular XML file

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi Ken,

Seems fine to me. I cannot reproduce the error(s) you're seeing. Running it
through the sax.Counter sample it completes normally:

java sax.Counter file:///D:/xmldocs/objbank_b.xml
file:///D:/xmldocs/objbank_b.xml: 375 ms (18340 elems, 10035 attrs, 0
spaces, 735650 chars).

so I imagine there's something else going on here, for example some
InputStream which is corrupting the document before the parser sees it or
you're using some old version of Xerces that had a bug or the JDK fork (for
which I keep hearing about issues that Apache Xerces has never had).

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Hamilton, Kenneth" <kh...@fdr.follett.com> wrote on 11/09/2009
03:05:45 PM:

> Hello.
>
> (My first post to this list.  Please excuse any breech in manners on my
> part.)
>
> I have been trying to parse an XML file which is part of the IMS Common
> Cartridge conformance suite.  The file encodes QTI elements and runs
> about 129,000 lines of text.  In my environment, which is
> Java/JBoss/Xerces, the Xerces parser fails with the message:
>
> [Fatal Error] :129176:1: Content is not allowed in trailing section.
>
> Nevertheless, Firefox and IE are able to parse it well enough to display
> the tree structure.  Thinking it might be size-related, I edited the
> file down by half, making sure the resulting file was well-formed, and
> re-tried.  (The file was pretty flat, about 1500 QTI items, so it was
> easy to do.)  That gave me:
>
> [Fatal Error] :59603:3: The markup in the document following the root
> element must be well-formed.
>
> (Again Firefox and IE have no trouble.)
>
> Halving again (resulting in the file I attach here) I get:
>
> [Fatal Error] :29761:1: Content is not allowed in trailing section.
>
> (Again, no problem for FF or IE.)
>
> The next halving solves the problem.  Each half separately passes
> through the parser.
>
> I have not been able to figure out what my problem is.  I doubled my JVM
> memory allocation, but this had no apparent effect.
>
> Can anyone see what is going wrong here?
>
> thanks!
>
> Ken Hamilton
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org