You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Jo...@sanofi-aventis.com on 2007/08/17 19:07:47 UTC

Handling multiple entries

Hello,
 
I've got an XML document of the following form:
 
<?xml version="1.0"?>
<Entrezgene>
...
<Entrezgene>
<Entrezgene>
...
</Entrezgene>
 
When I feed this to the sax parser it throws a SAXParser exception:
"SystemID:
C:\data\entrez_gene\DATA\ASN_BINARY\Mammalia\Homo_sapiens\Homo_sapiens_s
mall.xml
Location: 3443:2
Description: The markup in the document following the root element must
be well-formed."
 
The XML Schema defines that there can only be a single
<Entrezgene></Entrezgene> at the root of the document.  So I can see why
it might be throwing me this exception.
 
I tried to remedy this by enclosing the group of these elements by a
single <Entrezgene-Set> ... </Entrezgene-Set> tag which according to the
schema allows zero or more of these Entrezgene elements.  However,
validator/editors I've used to validate such a document against the
schema, fail to find any Entrezgene-Set declaration in the schema, which
itself is confusing me because it is in there and I'm wondering if it's
because of the way the schema uses a lot of includes and references.
Will the Schema object handle schemas defined in this way?
 
Indeed, I still receive the above error even after introducing filters
to mimic the Entrezgene-Set in my code.
 
The alternative is to parse out chunks of <Entrezgene> </Entrezgene>
blocks and stream each chunk as their own XML document to the parser.
Does anyone know how I can do this?  Would this require XML filters?
 
Thank You,
 
John Ling