You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by Ciaran McHale <ci...@iona.com> on 2006/03/07 15:48:03 UTC

xsd:include and xsd:import

Hi folks,

I am using Xerces C++ 2.7.0. Mostly, it works great, but I am finding
some unexpected behavior with "xsd:include" and "xsd:import" (where
"xsd" denotes the XML Schema namespace). The "included.xsd" file below
illustrates the problem.

----start of "included.xsd" file----
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema
     xmlns:xsd="http://www.w3.org/2001/XMLSchema"
     elementFormDefault="qualified"
     targetNamespace="http://www.lanw.com/namespaces/pub"
     xmlns="http://www.lanw.com/namespaces/pub">

         <xsd:include schemaLocation = "foo.xsd"/>

</xsd:schema>
----end of "included.xsd" file----

Near the end of this email I provide the relevant piece of code that I
use to parse the file.

If the included file ("foo.xsd") exists and contains a (deliberate) error
then Xerces reports the error. So far, so good.

Problem 1
---------
If the included file ("foo.xsd") does not exist then I would expect
Xerces to report an error, but Xerces just silently ignores the error.

Problem 2
---------
If the included file ("foo.xsd") exists and does not contain any errors
then Xerces parses everything okay. However, the resultant DOM tree does
not contain any nodes from the included file. This surprised me because
Xerces is a Schema-validating parser and "xsd:include" is part of XML
schema so I assumed that Xerces would insert contents of the included
file in the DOM tree.

If I replace the "xsd:include" element with a "xsd:import" element
then I still see problems 1 and 2 above.

I suspect that Problem 1 is a bug in Xerces. However, I am not sure about
Problem 2. Is that a bug? Or am I misunderstanding the division of
responsibilities between Xerces and the application developer? Does the
application developer (that is, me) have to walk the DOM tree to find
all "xsd:include" or "xsd:import" elements and call Xerces to parse each
of the specified files and then somehow merge the multiple parse trees?
If this is the case then I would appreciate some pointers to relevant
parts of the Xerces documentation or demos that illustrate this.

Below is the code (cut-n-pasted from my much longer application) that I
use to parse XML files. Perhaps somebody on this mailing list can spot if
I am neglecting to set a relevant option on the parser.

----start of code to use parse file----
// inputFile = name of input file
// isXsdFile = true if input filename ends with ".xsd"

try {
	XMLPlatformUtils::Initialize();
} catch(XMLException & ex) {
	cerr	<< "XMLPlatformUtils::Initialize() failed: "
		<< StrX(ex.getMessage())
		<< endl;
	return false;
}
DomXmlParserErrorHandler 	err_handler;
parser = new XercesDOMParser();

parser->setDoNamespaces(true);
parser->setCreateEntityReferenceNodes(true);
parser->setErrorHandler(&err_handler);
if (isXsdFile) {
	parser->setDoSchema(false);
	parser->setValidationSchemaFullChecking(false);
	parser->setValidationScheme(XercesDOMParser::Val_Never);
} else {
	parser->setDoSchema(true);
	parser->setValidationSchemaFullChecking(true);
	parser->setValidationScheme(XercesDOMParser::Val_Always);
}
try {
	if (isXsdFile) {
		parser->loadGrammar(inputFile,
				Grammar::SchemaGrammarType, true);
	}
	parser->parse(inputFile);
} catch(const XMLException & ex) {
	cerr	<< "ERROR: " << nputFile << ": "
		<< StrX(ex.getMessage())
		<< endl;
	delete parser;
	return 0;
}
----end of code to use parse file----


Regards,
Ciaran.
--
Ciaran McHale, Ph.D.        Email: Ciaran.McHale@iona.com
Principal Consultant        Tel: +44-(0)7866-416-134 (mobile)
IONA Technologies, UK       Tel: +44-(0)118-954-6632 (home office)
                             Fax: +44-(0)118-954-6767


Re: xsd:include and xsd:import

Posted by Ciaran McHale <ci...@iona.com>.
Hi folks,

First off, thanks to David, Boris and Axel for their very helpful
responses. I have been subscribed to this mailing list for a few
months and have found that it has a very high signal to noise ratio.

At 22:45 08/03/2006, Axel Weiß wrote:
 >[wonders what Ciaran is using Xerces for]

Put simply, I parse and validate an input XML or XSD file, walk the
resulting DOM tree and execute a few print statements for each node
that I encounter. The print statements generate a Tcl script that I
feed into a Tcl interpreter. By doing this, I am obtaining a DOM-like
tree structure in Tcl.

<Aside>
	There are probably several people on this mailing list thinking
	"But that's crazy; there is already a Tcl extension available
	that provides access to an XML parser and DOM tree". I am aware
	of that, but there is method to my madness of re-inventing the
	wheel. It would take too long to explain in depth the reasons for
	why I am doing this, but it is all part of what will ultimately
	be released as an open-source project. When the project is
	released to the world as some working code and documentation then
	I will send an announcement to this mailing list.
</Aside>

For my purposes, it would have been ideal if Xerces could automatically
insert the DOM tree from an xsd:import or xsd:include element into the
primary DOM tree. However, now that David and Boris have pointed me in
the right direction to do the import/include myself, it seems to be just
a SMOP (Simple Matter Of Programming).


Regards,
Ciaran.
--
Ciaran McHale, Ph.D.        Email: Ciaran.McHale@iona.com
Principal Consultant        Tel: +44-(0)7866-416-134 (mobile)
IONA Technologies, UK       Tel: +44-(0)118-954-6632 (home office)
                             Fax: +44-(0)118-954-6767


Re: xsd:include and xsd:import

Posted by Axel Weiß <aw...@informatik.hu-berlin.de>.
Ciaran McHale wrote:
> Hi folks,
>
> I am using Xerces C++ 2.7.0. Mostly, it works great, but I am finding
> some unexpected behavior with "xsd:include" and "xsd:import" (where
> "xsd" denotes the XML Schema namespace). The "included.xsd" file below
> illustrates the problem.
>
> ----start of "included.xsd" file----
> <?xml version="1.0" encoding="UTF-8"?>
> <xsd:schema
>      xmlns:xsd="http://www.w3.org/2001/XMLSchema"
>      elementFormDefault="qualified"
>      targetNamespace="http://www.lanw.com/namespaces/pub"
>      xmlns="http://www.lanw.com/namespaces/pub">
>
>          <xsd:include schemaLocation = "foo.xsd"/>
>
> </xsd:schema>
> ----end of "included.xsd" file----
>
> Near the end of this email I provide the relevant piece of code that I
> use to parse the file.
>
> If the included file ("foo.xsd") exists and contains a (deliberate)
> error then Xerces reports the error. So far, so good.
>
> Problem 1
> ---------
> If the included file ("foo.xsd") does not exist then I would expect
> Xerces to report an error, but Xerces just silently ignores the error.
>
> Problem 2
> ---------
> If the included file ("foo.xsd") exists and does not contain any
> errors then Xerces parses everything okay. However, the resultant DOM
> tree does not contain any nodes from the included file. This surprised
> me because Xerces is a Schema-validating parser and "xsd:include" is
> part of XML schema so I assumed that Xerces would insert contents of
> the included file in the DOM tree.

Hi Ciaran,

what are you going to do? Are you parsing xml files and validate them 
against your schemas - or are you parsing schema files and treat them as 
xml?

In the former case, xsl:include and xsl:import behave as you suspect, 
i.e. the referenced files are fetched and their contents fed into the 
grammar, recursively.

In the latter case, you are self responsible to interpret the schema 
tags, since as they are treated as xml, not as schema. Your code shows 
that you switch off validation when you suspect to parse a schema (which 
is fully correct). What are you planning to do with a DOM tree that 
contains a schema?

Maybe, there is some confusion about the way xerces checkes against 
schemata. The checking is done (neary) invisibly, and the schema 
contents can be accessed via grammar objects, not via DOM trees.

Could I get some light into your scene?

Cheers,
			Axel

Re: xsd:include and xsd:import

Posted by David Bertoni <db...@apache.org>.
Ciaran McHale wrote:
> Hi folks,
> 
> I am using Xerces C++ 2.7.0. Mostly, it works great, but I am finding
> some unexpected behavior with "xsd:include" and "xsd:import" (where
> "xsd" denotes the XML Schema namespace). The "included.xsd" file below
> illustrates the problem.
> 
> ----start of "included.xsd" file----
> <?xml version="1.0" encoding="UTF-8"?>
> <xsd:schema
>     xmlns:xsd="http://www.w3.org/2001/XMLSchema"
>     elementFormDefault="qualified"
>     targetNamespace="http://www.lanw.com/namespaces/pub"
>     xmlns="http://www.lanw.com/namespaces/pub">
> 
>         <xsd:include schemaLocation = "foo.xsd"/>
> 
> </xsd:schema>
> ----end of "included.xsd" file----
> 
> Near the end of this email I provide the relevant piece of code that I
> use to parse the file.
> 
> If the included file ("foo.xsd") exists and contains a (deliberate) error
> then Xerces reports the error. So far, so good.
> 
> Problem 1
> ---------
> If the included file ("foo.xsd") does not exist then I would expect
> Xerces to report an error, but Xerces just silently ignores the error.
> 
> Problem 2
> ---------
> If the included file ("foo.xsd") exists and does not contain any errors
> then Xerces parses everything okay. However, the resultant DOM tree does
> not contain any nodes from the included file. This surprised me because
> Xerces is a Schema-validating parser and "xsd:include" is part of XML
> schema so I assumed that Xerces would insert contents of the included
> file in the DOM tree.
> 
> If I replace the "xsd:include" element with a "xsd:import" element
> then I still see problems 1 and 2 above.
> 
> I suspect that Problem 1 is a bug in Xerces. However, I am not sure about
> Problem 2. Is that a bug? Or am I misunderstanding the division of
> responsibilities between Xerces and the application developer? Does the
> application developer (that is, me) have to walk the DOM tree to find
> all "xsd:include" or "xsd:import" elements and call Xerces to parse each
> of the specified files and then somehow merge the multiple parse trees?
> If this is the case then I would appreciate some pointers to relevant
> parts of the Xerces documentation or demos that illustrate this.
> 

You are assuming that Xerces-C, when parsing an XML Schema document, 
will interpret Schema-specific elements and/or attributes in some 
special way.  However, this is not true.

If you want to implement an application that interprets the XML Schema 
include and import elements, you will have to build that yourself.  The 
DOM has APIs that can help you do this, including:

     DOMDocument::getElementsByTagNameNS()
     DOMDocument::importNode()
     DOMNode::appendChild()

Any good tutorial on the DOM will help you figure out how these APIs work.

Dave