You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Mauro Molinari <ma...@cardinis.com> on 2008/05/26 13:03:41 UTC

Newbie: help with getElementById on an XML validated against an XSD

All all!
I'm new to XML development. After struggling for HOURS, I then decided 
to ask for help...

I've the following problem. I've written a simple XSD and a simple XML 
that uses that XSD (no DTD defined for that XML). Both the XSD and the 
XML are valid (verified with Eclipse). They are put in the same 
directory and the XML references to the XSD through schemaLocation 
attribute and specifies the relative path to the XSD (i.e.: just its 
filename).
I'm using JDK 1.5.0 and I need to write a parser that reads the XML and 
gets an element from it. This element is an attribute named "id" and of 
type "xs:ID".

My original code was the following:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder docb = dbf.newDocumentBuilder();
Document doc = 
docb.parse(getClass().getResourceAsStream(myXMLFileFullName));
doc.getElementById("foo"); // (*)

Problem #1: the statement at line (*) returns null, because the parser 
does not recognize the element "id" as an ID.
Debugging my code, I found that the implementation uses XercesJ.
Searching with google I found this:
http://xerces.apache.org/xerces2-j/faq-dom.html#faq-13
So, I modified my code as follows:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(true);
dbf.setFeature("http://apache.org/xml/features/validation/schema", 
true); (**)
DocumentBuilder docb = dbf.newDocumentBuilder();
Document doc = 
docb.parse(getClass().getResourceAsStream(myXMLFileFullName));
doc.getElementById("foo");

Apart from the fact that line (**) sounds to me like I've lost the 
abstraction from the underlying implementation, anyway I can't get it to 
work yet. The error I got is "cannot find definition for element 'bar'", 
where <bar> is actually the root element of my XML, as defined in the 
XSD. So, it seems it is not taking the XSD to do validation.

After searching a lot I found lots of possible hints or "solutions", like:

- setting on dbf the following attributes:
"http://java.sun.com/xml/jaxp/properties/schemaLanguage"
to "http://www.w3.org/2001/XMLSchema" and 
"http://java.sun.com/xml/jaxp/properties/schemaSource" to the full path 
of my XSD: do I really need this? Aren't these attributes depending on 
the underlying implementation? Isn't there a way to tell the parser to 
pick up the XSD automatically, since its path is specified in the 
schemaLocation attribute of the XML? Determining the actual full path to 
the XSD could be not so simple...

- setting on dbf the following attribute:
"http://java.sun.com/xml/jaxp/properties/schemaLanguage" to 
"http://www.w3.org/2001/XMLSchema"; this doesn't help

- setting a Schema instance through dbf.setSchema, by creating a Schema 
instance through SchemaFactory, created on my XSD file

- adding dbf.setNamespaceAware(true);

My problem is this: I can't get the whole thing to work and I can't 
understand what I really need among all this stuff. I tried to follow 
some examples I found on the net, but couldn't get it to work. Sometimes 
the parser simply says it can't find the definition of 'bar' element, 
sometimes it also says it can't locate the XSD. In any case, I can't get 
getElementById to work.

Any help would really be appreciated.

Thank you in advance.

-- 
Mauro Molinari
Software Developer
mauro.molinari@cardinis.com

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: Newbie: help with getElementById on an XML validated against an XSD

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi Mauro,

Mauro Molinari <ma...@cardinis.com> wrote on 06/04/2008 04:36:42
AM:

> Hi Michael,
> thank you for your reply. In the end, I found a "solution" to my problem.
> First of all, I had to call DocumentBuilder.parse(InputStream, String),
> rather than DocumentBuilder.parse(InputStream) in order to make the
> parser find schemas, DTDs etc. referenced by the XML file, otherwise it
> searched for them using my IDE working directory as the base for
> resolving relative paths...

You should always provide a base URI to the parser may need to resolve any
relative ones. If none is specified Xerces will fall back to using the
current working directory (the value of the system property user.dir) as
the base URI for resolution. Even better if you can let the parser open the
InputStream itself (e.g. using DocumentBuilder.parse(String)) where it has
an opportunity to refresh the base URI if it got redirected as a result of
opening the URLConnection. Specifically relevant for HTTP URLs.

> Once I understood this, I found that the following code can do the job:
>
> DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
> dbf.setNamespaceAware(true);
> dbf.setValidating(true);
> dbf.setFeature("http://apache.org/xml/features/validation/schema", true);
> DocumentBuilder db = dbf.newDocumentBuilder();
> Document document = db.parse(xmlURL.openStream(),
> xmlURL.toURI().toString());
>
> In this way I don't have to specify a schema (it is automatically taken
> and parsed by Xerces thanks to the schemaLocation attribute in the XML),
> but I lose the abstraction from the underlying parser implementation by
> setting the Xerces feature needed to make schema validation (and
> getElementById()) work. Please remember that I'm using standard JSE 5
> APIs to do the XML parsing.

Right. That's a Xerces specific feature, but you could have used JAXP (e.g.
SchemaFactory) to accomplish the same thing and not tied yourself to the
Xerces implementation.

> So, I then decided to write a DTD for the XML and make the parser use it
> to enable getElementById(), although I don't like the solution so much
> (actually, having the schema, in this case the DTD is redundant, I use
> it only to make the parser work as expected, without the need of setting
> any Xerces-specific feature on the document builder factory).

There's no reason you couldn't have done this with schema too. As I said
above, you don't need to set any Xerces specific features.

> The resulting code is now:
>
> DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
> dbf.setNamespaceAware(true);
> DocumentBuilder db = dbf.newDocumentBuilder();
> Document document = db.parse(xmlURL.openStream(),
> xmlURL.toURI().toString());
>
> that actually seems more implementation-independent to me.
>
> Thanks again for your help!
>
> --
> Mauro Molinari
> Software Developer
> mauro.molinari@cardinis.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Re: Newbie: help with getElementById on an XML validated against an XSD

Posted by Mauro Molinari <ma...@cardinis.com>.
Hi Michael,
thank you for your reply. In the end, I found a "solution" to my problem.
First of all, I had to call DocumentBuilder.parse(InputStream, String), 
rather than DocumentBuilder.parse(InputStream) in order to make the 
parser find schemas, DTDs etc. referenced by the XML file, otherwise it 
searched for them using my IDE working directory as the base for 
resolving relative paths...

Once I understood this, I found that the following code can do the job:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setValidating(true);
dbf.setFeature("http://apache.org/xml/features/validation/schema", true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(xmlURL.openStream(), 
xmlURL.toURI().toString());

In this way I don't have to specify a schema (it is automatically taken 
and parsed by Xerces thanks to the schemaLocation attribute in the XML), 
but I lose the abstraction from the underlying parser implementation by 
setting the Xerces feature needed to make schema validation (and 
getElementById()) work. Please remember that I'm using standard JSE 5 
APIs to do the XML parsing.

So, I then decided to write a DTD for the XML and make the parser use it 
to enable getElementById(), although I don't like the solution so much 
(actually, having the schema, in this case the DTD is redundant, I use 
it only to make the parser work as expected, without the need of setting 
any Xerces-specific feature on the document builder factory). The 
resulting code is now:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(xmlURL.openStream(), 
xmlURL.toURI().toString());

that actually seems more implementation-independent to me.

Thanks again for your help!

-- 
Mauro Molinari
Software Developer
mauro.molinari@cardinis.com

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: Newbie: help with getElementById on an XML validated against an XSD

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi Mauro,

As you've already discovered there are many ways to enable schema
validation. If you have no particular reason to use the others I recommend
the JAXP 1.3 Validation API [1] methods. The samples [2][3] distributed
with Xerces demonstrate correct usage. Regardless of how you enable schema
validation you must always use a namespace aware parser, so setting
dbf.setNamespaceAware(true) is essential.

If you've got everything set up correctly and its still doesn't work for
you based on what you wrote it would seem to be because you're expecting
the getElementById() [4] method to do something it doesn't. Specifically
the method "returns the Element that has an ID attribute with the given
value" so if the declared type of an element is "xs:ID" you won't find it
this way.

Thanks.

[1]
http://xerces.apache.org/xerces2-j/javadocs/api/javax/xml/validation/package-summary.html
[2] http://xerces.apache.org/xerces2-j/samples-jaxp.html#ParserAPIUsage
[3] http://xerces.apache.org/xerces2-j/samples-jaxp.html#SourceValidator
[4]
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#ID-getElBId

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Mauro Molinari <ma...@cardinis.com> wrote on 05/26/2008 07:03:41
AM:

> All all!
> I'm new to XML development. After struggling for HOURS, I then decided
> to ask for help...
>
> I've the following problem. I've written a simple XSD and a simple XML
> that uses that XSD (no DTD defined for that XML). Both the XSD and the
> XML are valid (verified with Eclipse). They are put in the same
> directory and the XML references to the XSD through schemaLocation
> attribute and specifies the relative path to the XSD (i.e.: just its
> filename).
> I'm using JDK 1.5.0 and I need to write a parser that reads the XML and
> gets an element from it. This element is an attribute named "id" and of
> type "xs:ID".
>
> My original code was the following:
>
> DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
> DocumentBuilder docb = dbf.newDocumentBuilder();
> Document doc =
> docb.parse(getClass().getResourceAsStream(myXMLFileFullName));
> doc.getElementById("foo"); // (*)
>
> Problem #1: the statement at line (*) returns null, because the parser
> does not recognize the element "id" as an ID.
> Debugging my code, I found that the implementation uses XercesJ.
> Searching with google I found this:
> http://xerces.apache.org/xerces2-j/faq-dom.html#faq-13
> So, I modified my code as follows:
>
> DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
> dbf.setValidating(true);
> dbf.setFeature("http://apache.org/xml/features/validation/schema",
> true); (**)
> DocumentBuilder docb = dbf.newDocumentBuilder();
> Document doc =
> docb.parse(getClass().getResourceAsStream(myXMLFileFullName));
> doc.getElementById("foo");
>
> Apart from the fact that line (**) sounds to me like I've lost the
> abstraction from the underlying implementation, anyway I can't get it to
> work yet. The error I got is "cannot find definition for element 'bar'",
> where <bar> is actually the root element of my XML, as defined in the
> XSD. So, it seems it is not taking the XSD to do validation.
>
> After searching a lot I found lots of possible hints or "solutions",
like:
>
> - setting on dbf the following attributes:
> "http://java.sun.com/xml/jaxp/properties/schemaLanguage"
> to "http://www.w3.org/2001/XMLSchema" and
> "http://java.sun.com/xml/jaxp/properties/schemaSource" to the full path
> of my XSD: do I really need this? Aren't these attributes depending on
> the underlying implementation? Isn't there a way to tell the parser to
> pick up the XSD automatically, since its path is specified in the
> schemaLocation attribute of the XML? Determining the actual full path to
> the XSD could be not so simple...
>
> - setting on dbf the following attribute:
> "http://java.sun.com/xml/jaxp/properties/schemaLanguage" to
> "http://www.w3.org/2001/XMLSchema"; this doesn't help
>
> - setting a Schema instance through dbf.setSchema, by creating a Schema
> instance through SchemaFactory, created on my XSD file
>
> - adding dbf.setNamespaceAware(true);
>
> My problem is this: I can't get the whole thing to work and I can't
> understand what I really need among all this stuff. I tried to follow
> some examples I found on the net, but couldn't get it to work. Sometimes
> the parser simply says it can't find the definition of 'bar' element,
> sometimes it also says it can't locate the XSD. In any case, I can't get
> getElementById to work.
>
> Any help would really be appreciated.
>
> Thank you in advance.
>
> --
> Mauro Molinari
> Software Developer
> mauro.molinari@cardinis.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org