You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-users@xerces.apache.org by Dick Deneer <di...@donkeydevelopment.com> on 2006/07/04 19:59:17 UTC

Using grammarpool with included schemas

Hi,

I am using a grammarpool to cache schemas and want the user to point  
to a number of schemas that will be added to the grammarpool. Then  
the grammarpool is used to validate a XML instance document.
One of the schema (=top )  uses a include to another schema (=sub ).   
If I can  rely on the location mentioned in the include there is no  
problem. Just add  "top" to the pool and everything is fine.
But when the path in the top schema is invalid, the included schema  
cannot be found. My first idea was just let the user add the  
subschema also to the pool. But this has no effect.  The result is  
that the subschema is just ignored and will not be in the grammarpool  
(likely caused by the fact that there is already a grammar with the  
same namespace in the pool, namely "top"). Validation will be  
incomplete because the type definitions mentioned in sub are ignored.

Working in the other direction (first add the sub, then the top  
schema) does not not work either: only the sub schema will be added  
to the grammarpool

The ideal situation for me would be: just add another schema to the  
pool if the parser gives an error that something cannot be found.

Is there no other way then using a entityresolver?
If so: can I detect in the entityresolver that the schema will or  
will not be found by the parser.if "not" , I have to return my own  
inputsource.
Can I also use such a entityresolver when using a XMLGrammarPreparser?


PS
You can easily "replay" the above situations with  the  
XMLGrammarBuilder.java program that is supplied with xerces. But also  
using direct (without preparsing) the dom or sax parser with a  
grammarpool will give the same results.
I used these xml and schemas

top.xsd:
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
	<xsd:include schemaLocation="include.xsd"/>
	<xsd:element name="myRoot" type="myRootType"/>
	<xsd:complexType name="myRootType">
		<xsd:sequence>
			<xsd:element name="label" type="labelType"/>
		</xsd:sequence>
	</xsd:complexType>
</xsd:schema>

include.xsd:
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
	<xsd:simpleType name="labelType">
		<xsd:restriction base="xsd:string">
			<xsd:enumeration value="01"/>
			<xsd:enumeration value="02"/>
		</xsd:restriction>
	</xsd:simpleType>
</xsd:schema>

instance.xml:
<?xml version="1.0" encoding="UTF-8"?>
<myRoot>
     <label>028</label>
</myRoot>

Re: Using grammarpool with included schemas

Posted by Michael Glavassevich <mr...@ca.ibm.com>.

Dick Deneer <di...@donkeydevelopment.com> wrote on 07/09/2006 
01:11:26 PM:

> Thank you very much for this explanation. 
> I was not aware of the jira-1158 discussion. But I am very glad to 
> see that others have had their doubts also about this topic; this 
> make me think less stupid :)
>
> But I am still struggling on.
> In my XML tool you can browse the xml in tree and source view. In 
> the tree view there is a DOM behind and I am working with the Xerces
> 3 Dom revalidation.
> In the past I have tried to convert this to the JAXP Validator using
> a DOMSource, but there was a blocking issue: in case of error the 
> validator passes a SAXException with no reference to the current 
> element node. We already discussed this issue in the passed and you 
> were thinking about an extra property to solve this. If this is 
> already implemented I missed this.

It was implemented in Xerces 2.8.0. If your ErrorHandler has a reference 
to the Validator and you query the 
http://apache.org/xml/properties/dom/current-element-node property [1] the 
Validator will return the current element node that is being visited.
 
> In the source view I am working with sax. 
> While switching between source and dom view I want to reuse the 
> compiled grammars between them. 
>
> I did some tests with the validator api but I see a lot of problems:
> 
> - It looks very difficult to hand over a filled grammarpool to the 
> validator.  I thouht about just:  reading the grammars  from the 
> filled grammarpool,then  creating a Schema with the SchemaFactory 
> and then adding the grammars to the Schema which is by itself a 
> grammarpool (or is this in fact the bucket and not the real 
> grammarpool). But I displayed the Schema classes, made by the 
> schemaFactory and they differ depending on the number of schemas 
> supplied.  So I am afraid I cannot start with just an empty schema 
> instantiation.

Couldn't you just create the Schema from the SchemaFactory using all of 
sources that you would have used to populate the grammar pool? It's true 
that the Schema implementations in Xerces are backed by a grammar pool but 
you really shouldn't be trying to do anything with these grammar pools. 
They are only intended for consumption by the JAXP validator. If you try 
writing to them you'll find three of the four implementations are 
immutable and the fourth is a memory sensitive cache which releases 
grammars to the garbage collector in response to memory demand so the 
grammar you add might get tossed at some point.

> - Also I did a test with the LSResourceResolver with the following XML
> <purchaseOrder orderDate="1999-10-20" xmlns="http://tempuri.org/po.xsd"
>     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="
> http://tempuri.org/po.xsd C:\Temp\Schemas\apo.xsd">
> If I created a empty schema : Schema schema = factory.newSchema();
> the the resolver was called and I got the following output:
> DTD Support : false
> schema org.apache.xerces.jaxp.validation.WeakReferenceXMLSchema@6f7ce9
> validator has resource resolver ValidatorEx$4@aa57fb
> Validator Asking for resourece:
> --type                    http://www.w3.org/2001/XMLSchema
> --namespaceURI            http://tempuri.org/po.xsd
> --resolve Entity publicId null
> --resolve Entity systemId C:\Temp\Schemas\apo.xsd
> --baseURI                 
file:///Users/dick/Documents/workspace/Tester/po.xml
> validator warning org.xml.sax.SAXParseException: schema_reference.4:
> Failed to read schema document 'C:\Temp\Schemas\apo.xsd', because 1)
> could not find the document; 2) the document could not be read; 3) 
> the root element of the document is not <xsd:schema>.
> validator error org.xml.sax.SAXParseException: cvc-elt.1: Cannot 
> find the declaration of element 'purchaseOrder'.
> cause null
> getException null
> org.xml.sax.SAXParseException: cvc-elt.1: Cannot find the 
> declaration of element 'purchaseOrder'.
> 
> f I created a nonempty schema : Schema schema = factory.
> newSchema(new StreamSource[]{pe});
> The schema has another namespace then referenced in the xml.
> then the resolver was NOT called and I got the following output:
> DTD Support : false
> schema org.apache.xerces.jaxp.validation.SimpleXMLSchema@a352a5
> schema is an XMLGrammarPool
> --found XSGrammar in pool---
> Grammar is instanceof XMLSchemaDescription
> XMLSchemaDescription.getLiteralSystemId file:
> ///Users/dick/Documents/workspace/Tester/personal.xsd
> 
> validator has resource resolver ValidatorEx$4@e1d5ea
> validator error org.xml.sax.SAXParseException: cvc-elt.1: Cannot 
> find the declaration of element 'purchaseOrder'.
> cause null
> getException null
> org.xml.sax.SAXParseException: cvc-elt.1: Cannot find the 
> declaration of element 'purchaseOrder'
> 
> When I als add the referenced schema: Schema schema = factory.
> newSchema(new StreamSource[]{pe,po});
> then the instace is checked and error are repotred as expected.
> DTD Support : false
> schema org.apache.xerces.jaxp.validation.XMLSchema@406199
> validator has resource resolver ValidatorEx$4@2b3d53
> validator error org.xml.sax.SAXParseException: cvc-pattern-valid: 
> Value '926-AAX' is not facet-valid with respect to pattern '\d{3}-
> [A-Z]{2}' for type 'SKU'.
> cause null
> getException null
> org.xml.sax.SAXParseException: cvc-pattern-valid: Value '926-AAX' is
> not facet-valid with respect to pattern '\d{3}-[A-Z]{2}' for type 'SKU'.
> 
> I do not understand why the resolver is not called in the second case ?

The Schema returned by SchemaFactory.newSchema() [2] has different 
semantics to the ones returned by the other newSchema() [3] methods. Think 
of the SchemaFactory.newSchema() schema as open, rather than empty. It 
pulls in schema components during validation from the schema locations 
specified in the documents and/or the sources returned by your 
LSResourceResolver if you registered one with the Validator. The Schema 
objects returned by the other newSchema() methods are closed (or fully 
composed). Only the schema components which were loaded by the 
SchemaFactory are used for validation. Your LSResourceResolver won't be 
called during validation and any schema location hints in the document 
will be ignored.

> I attached my test resources.

[1] 
http://xerces.apache.org/xerces2-j/properties.html#dom.current-element-node
[2] 
http://xerces.apache.org/xerces2-j/javadocs/api/javax/xml/validation/SchemaFactory.html#newSchema()
[3] 
http://xerces.apache.org/xerces2-j/javadocs/api/javax/xml/validation/SchemaFactory.html#newSchema(javax.xml.transform.Source[])

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

Re: Using grammarpool with included schemas

Posted by Dick Deneer <di...@donkeydevelopment.com>.

Thank you very much for this explanation.
I was not aware of the jira-1158 discussion. But I am very glad to  
see that others have had their doubts also about this topic; this  
make me think less stupid :)

But I am still struggling on.
In my XML tool you can browse the xml in tree and source view. In the  
tree view there is a DOM behind and I am working with the Xerces 3  
Dom revalidation.
In the past I have tried to convert this to the JAXP Validator using  
a DOMSource, but there was a blocking issue: in case of error the  
validator passes a SAXException with no reference to the current  
element node. We already discussed this issue in the passed and you  
were thinking about an extra property to solve this. If this is  
already implemented I missed this.

In the source view I am working with sax.
While switching between source and dom view I want to reuse the  
compiled grammars between them.

I did some tests with the validator api but I see a lot of problems:

- It looks very difficult to hand over a filled grammarpool to the  
validator.  I thouht about just:  reading the grammars  from the  
filled grammarpool,then  creating a Schema with the SchemaFactory and  
then adding the grammars to the Schema which is by itself a  
grammarpool (or is this in fact the bucket and not the real  
grammarpool). But I displayed the Schema classes, made by the  
schemaFactory and they differ depending on the number of schemas  
supplied.  So I am afraid I cannot start with just an empty schema  
instantiation.

- Also I did a test with the LSResourceResolver with the following XML
<purchaseOrder orderDate="1999-10-20" xmlns="http://tempuri.org/po.xsd"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
xsi:schemaLocation="http://tempuri.org/po.xsd C:\Temp\Schemas\apo.xsd">
If I created a empty schema : Schema schema = factory.newSchema();
the the resolver was called and I got the following output:
DTD Support : false
schema org.apache.xerces.jaxp.validation.WeakReferenceXMLSchema@6f7ce9
validator has resource resolver ValidatorEx$4@aa57fb
Validator Asking for resourece:
--type                    http://www.w3.org/2001/XMLSchema
--namespaceURI            http://tempuri.org/po.xsd
--resolve Entity publicId null
--resolve Entity systemId C:\Temp\Schemas\apo.xsd
--baseURI                 file:///Users/dick/Documents/workspace/ 
Tester/po.xml
validator warning org.xml.sax.SAXParseException: schema_reference.4:  
Failed to read schema document 'C:\Temp\Schemas\apo.xsd', because 1)  
could not find the document; 2) the document could not be read; 3)  
the root element of the document is not <xsd:schema>.
validator error org.xml.sax.SAXParseException: cvc-elt.1: Cannot find  
the declaration of element 'purchaseOrder'.
cause null
getException null
org.xml.sax.SAXParseException: cvc-elt.1: Cannot find the declaration  
of element 'purchaseOrder'.

f I created a nonempty schema : Schema schema = factory.newSchema(new  
StreamSource[]{pe});
The schema has another namespace then referenced in the xml.
then the resolver was NOT called and I got the following output:
DTD Support : false
schema org.apache.xerces.jaxp.validation.SimpleXMLSchema@a352a5
schema is an XMLGrammarPool
--found XSGrammar in pool---
Grammar is instanceof XMLSchemaDescription
XMLSchemaDescription.getLiteralSystemId file:///Users/dick/Documents/ 
workspace/Tester/personal.xsd

validator has resource resolver ValidatorEx$4@e1d5ea
validator error org.xml.sax.SAXParseException: cvc-elt.1: Cannot find  
the declaration of element 'purchaseOrder'.
cause null
getException null
org.xml.sax.SAXParseException: cvc-elt.1: Cannot find the declaration  
of element 'purchaseOrder'

When I als add the referenced schema: Schema schema =  
factory.newSchema(new StreamSource[]{pe,po});
then the instace is checked and error are repotred as expected.
DTD Support : false
schema org.apache.xerces.jaxp.validation.XMLSchema@406199
validator has resource resolver ValidatorEx$4@2b3d53
validator error org.xml.sax.SAXParseException: cvc-pattern-valid:  
Value '926-AAX' is not facet-valid with respect to pattern '\d{3}-[A- 
Z]{2}' for type 'SKU'.
cause null
getException null
org.xml.sax.SAXParseException: cvc-pattern-valid: Value '926-AAX' is  
not facet-valid with respect to pattern '\d{3}-[A-Z]{2}' for type 'SKU'.


I do not understand why the resolver is not called in the second case ?

I attached my test resources.

ï¿¼ï¿¼ï¿¼

Op 8-jul-2006, om 0:36 heeft Michael Glavassevich het volgende  
geschreven:

> Dick Deneer <di...@donkeydevelopment.com> wrote on 07/07/2006
> 05:36:29 PM:
>
>> Ok, if that's all I will do that, but it was more the principal that
>> I do not want to know how the schema is resolved b the parser, I only
>> want to know if it was successful or not.
>
> It's not an implementation detail. The API doc for
> EntityResolver.resolveEntity() [1] says precisely what happens: "SAX
> specifies how to interpret any InputSource returned by this method,  
> and
> that if none is returned, then the system ID will be dereferenced as a
> URL". Saving the one line of code "new
> java.net.URL(systemId).openStream()" just doesn't seem like a good  
> enough
> reason to add a new property.
>
>> Still I believe the support for schema resolving is less compared to
>> DTD's.  For DTD's you get all you want publicid systemid or name of
>> the root. All are also keys for the grammarpool. For schemas you get
>> nothing. The most important thing (the namespace, also used as key in
>> the grammarpool, which also is a indication of importance)  lacks.
>> This makes it very hard to return your own inputsource.
>
> SAX's EntityResolver was only designed for resolving external entities
> (the external DTD subset, external parameter entities and external  
> general
> entities). If you want the target namespace for the schema you  
> should use
> an API which has a resolver that will pass that information (see  
> the JAXP
> 1.3 Validation API and LSResourceResolver [2]) to you. For more  
> info you
> should also take a look at:
> http://issues.apache.org/jira/browse/XERCESJ-1158 where this issue was
> last discussed.
>
>> Op 7-jul-2006, om 23:08 heeft Michael Glavassevich het volgende
>> geschreven:
>>
>>> Dick Deneer <di...@donkeydevelopment.com> wrote on 07/07/2006
>>> 04:00:09 PM:
>>>
>>>> You may be theorically right, but it is obvious that it would be  
>>>> very
>>>> practical to have it available together with the systemid in the
>>>> entityResolver.  Returning a schema with another namespace is just
>>>> useless.
>>>>
>>>> And in continuing about my question if the parser will resolve the
>>>> entity by itself or not, I will suggest for another property  where
>>>> you van set a kind of finalResolver with the same method as
>>>> resolveEntity, that wil get a callback if the parser did not  
>>>> find the
>>>> entity. Then you get a last chance to resolve it yourself.
>>>
>>> I don't see the need for this. You already get a chance in your
>>> EntityResolver and it can try opening an InputStream from the
>>> system ID
>>> (the default behaviour) and if that fails it can do something else.
>>>
>>>> Op 7-jul-2006, om 20:58 heeft Joseph Kesselman het volgende
>>>> geschreven:
>>>>
>>>>> A namespace name, although it is expressed as a URI, is just a
>>>>> name. Normal
>>>>> XML processing never never attempt to retrieve anything from  
>>>>> it, so
>>>>> it is
>>>>> never processed by the EntityResolver.
>>>>>
>>>>> (The Semantic Web group may eventually define what, if anything,
>>>>> might be
>>>>> accessable through the namespace URI. But for now, treat it just
>>>>> as a
>>>>> string in URI syntax.)
>>>>>
>>>>> ______________________________________
>>>>> "... Three things see no end: A loop with exit code done wrong,
>>>>> A semaphore untested, And the change that comes along. ..."
>>>>>   -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish
>>>>> (http://www.ovff.org/pegasus/songs/threes-rev-11.html)
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> --
>
>>>>> -
>>>>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>>>>> For additional commands, e-mail: j-users-help@xerces.apache.org
>>>
>>> Michael Glavassevich
>>> XML Parser Development
>>> IBM Toronto Lab
>>> E-mail: mrglavas@ca.ibm.com
>>> E-mail: mrglavas@apache.org
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>>> For additional commands, e-mail: j-users-help@xerces.apache.org
>>>
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>> For additional commands, e-mail: j-users-help@xerces.apache.org
>
> [1]
> http://xerces.apache.org/xerces2-j/javadocs/api/org/xml/sax/ 
> EntityResolver.html#resolveEntity(java.lang.String,% 
> 20java.lang.String)
> [2]
> http://xerces.apache.org/xerces2-j/javadocs/api/org/w3c/dom/ls/ 
> LSResourceResolver.html
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
>
>
>
>

Re: Using grammarpool with included schemas

Posted by Michael Glavassevich <mr...@ca.ibm.com>.

Dick Deneer <di...@donkeydevelopment.com> wrote on 07/07/2006 
05:36:29 PM:

> Ok, if that's all I will do that, but it was more the principal that 
> I do not want to know how the schema is resolved b the parser, I only 
> want to know if it was successful or not.

It's not an implementation detail. The API doc for 
EntityResolver.resolveEntity() [1] says precisely what happens: "SAX 
specifies how to interpret any InputSource returned by this method, and 
that if none is returned, then the system ID will be dereferenced as a 
URL". Saving the one line of code "new 
java.net.URL(systemId).openStream()" just doesn't seem like a good enough 
reason to add a new property.

> Still I believe the support for schema resolving is less compared to 
> DTD's.  For DTD's you get all you want publicid systemid or name of 
> the root. All are also keys for the grammarpool. For schemas you get 
> nothing. The most important thing (the namespace, also used as key in 
> the grammarpool, which also is a indication of importance)  lacks. 
> This makes it very hard to return your own inputsource.

SAX's EntityResolver was only designed for resolving external entities 
(the external DTD subset, external parameter entities and external general 
entities). If you want the target namespace for the schema you should use 
an API which has a resolver that will pass that information (see the JAXP 
1.3 Validation API and LSResourceResolver [2]) to you. For more info you 
should also take a look at: 
http://issues.apache.org/jira/browse/XERCESJ-1158 where this issue was 
last discussed.

> Op 7-jul-2006, om 23:08 heeft Michael Glavassevich het volgende 
> geschreven:
> 
> > Dick Deneer <di...@donkeydevelopment.com> wrote on 07/07/2006
> > 04:00:09 PM:
> >
> >> You may be theorically right, but it is obvious that it would be very
> >> practical to have it available together with the systemid in the
> >> entityResolver.  Returning a schema with another namespace is just
> >> useless.
> >>
> >> And in continuing about my question if the parser will resolve the
> >> entity by itself or not, I will suggest for another property  where
> >> you van set a kind of finalResolver with the same method as
> >> resolveEntity, that wil get a callback if the parser did not find the
> >> entity. Then you get a last chance to resolve it yourself.
> >
> > I don't see the need for this. You already get a chance in your
> > EntityResolver and it can try opening an InputStream from the 
> > system ID
> > (the default behaviour) and if that fails it can do something else.
> >
> >> Op 7-jul-2006, om 20:58 heeft Joseph Kesselman het volgende 
> >> geschreven:
> >>
> >>> A namespace name, although it is expressed as a URI, is just a
> >>> name. Normal
> >>> XML processing never never attempt to retrieve anything from it, so
> >>> it is
> >>> never processed by the EntityResolver.
> >>>
> >>> (The Semantic Web group may eventually define what, if anything,
> >>> might be
> >>> accessable through the namespace URI. But for now, treat it just 
> >>> as a
> >>> string in URI syntax.)
> >>>
> >>> ______________________________________
> >>> "... Three things see no end: A loop with exit code done wrong,
> >>> A semaphore untested, And the change that comes along. ..."
> >>>   -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish
> >>> (http://www.ovff.org/pegasus/songs/threes-rev-11.html)
> >>>
> >>>
> >>> -------------------------------------------------------------------- 

> >>> -
> >>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> >>> For additional commands, e-mail: j-users-help@xerces.apache.org
> >
> > Michael Glavassevich
> > XML Parser Development
> > IBM Toronto Lab
> > E-mail: mrglavas@ca.ibm.com
> > E-mail: mrglavas@apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: j-users-help@xerces.apache.org
> >
> >
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org

[1] 
http://xerces.apache.org/xerces2-j/javadocs/api/org/xml/sax/EntityResolver.html#resolveEntity(java.lang.String,%20java.lang.String)
[2] 
http://xerces.apache.org/xerces2-j/javadocs/api/org/w3c/dom/ls/LSResourceResolver.html

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

Re: Using grammarpool with included schemas

Posted by Dick Deneer <di...@donkeydevelopment.com>.

Ok, if that's all I will do that, but it was more the principal that  
I do not want to know how the schema is resolved b the parser, I only  
want to know if it was successful or not.
Still I believe the support for schema resolving is less compared to  
DTD's.  For DTD's you get all you want publicid systemid or name of  
the root. All are also keys for the grammarpool. For schemas you get  
nothing. The most important thing (the namespace, also used as key in  
the grammarpool, which also is a indication of importance)  lacks.  
This makes it very hard to return your own inputsource.

Op 7-jul-2006, om 23:08 heeft Michael Glavassevich het volgende  
geschreven:

> Dick Deneer <di...@donkeydevelopment.com> wrote on 07/07/2006
> 04:00:09 PM:
>
>> You may be theorically right, but it is obvious that it would be very
>> practical to have it available together with the systemid in the
>> entityResolver.  Returning a schema with another namespace is just
>> useless.
>>
>> And in continuing about my question if the parser will resolve the
>> entity by itself or not, I will suggest for another property  where
>> you van set a kind of finalResolver with the same method as
>> resolveEntity, that wil get a callback if the parser did not find the
>> entity. Then you get a last chance to resolve it yourself.
>
> I don't see the need for this. You already get a chance in your
> EntityResolver and it can try opening an InputStream from the  
> system ID
> (the default behaviour) and if that fails it can do something else.
>
>> Op 7-jul-2006, om 20:58 heeft Joseph Kesselman het volgende  
>> geschreven:
>>
>>> A namespace name, although it is expressed as a URI, is just a
>>> name. Normal
>>> XML processing never never attempt to retrieve anything from it, so
>>> it is
>>> never processed by the EntityResolver.
>>>
>>> (The Semantic Web group may eventually define what, if anything,
>>> might be
>>> accessable through the namespace URI. But for now, treat it just  
>>> as a
>>> string in URI syntax.)
>>>
>>> ______________________________________
>>> "... Three things see no end: A loop with exit code done wrong,
>>> A semaphore untested, And the change that comes along. ..."
>>>   -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish
>>> (http://www.ovff.org/pegasus/songs/threes-rev-11.html)
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>>> For additional commands, e-mail: j-users-help@xerces.apache.org
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

Re: Using grammarpool with included schemas

Posted by Michael Glavassevich <mr...@ca.ibm.com>.

Dick Deneer <di...@donkeydevelopment.com> wrote on 07/07/2006 
04:00:09 PM:

> You may be theorically right, but it is obvious that it would be very 
> practical to have it available together with the systemid in the 
> entityResolver.  Returning a schema with another namespace is just 
> useless.
> 
> And in continuing about my question if the parser will resolve the 
> entity by itself or not, I will suggest for another property  where 
> you van set a kind of finalResolver with the same method as 
> resolveEntity, that wil get a callback if the parser did not find the 
> entity. Then you get a last chance to resolve it yourself.

I don't see the need for this. You already get a chance in your 
EntityResolver and it can try opening an InputStream from the system ID 
(the default behaviour) and if that fails it can do something else.
 
> Op 7-jul-2006, om 20:58 heeft Joseph Kesselman het volgende geschreven:
> 
> > A namespace name, although it is expressed as a URI, is just a 
> > name. Normal
> > XML processing never never attempt to retrieve anything from it, so 
> > it is
> > never processed by the EntityResolver.
> >
> > (The Semantic Web group may eventually define what, if anything, 
> > might be
> > accessable through the namespace URI. But for now, treat it just as a
> > string in URI syntax.)
> >
> > ______________________________________
> > "... Three things see no end: A loop with exit code done wrong,
> > A semaphore untested, And the change that comes along. ..."
> >   -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish
> > (http://www.ovff.org/pegasus/songs/threes-rev-11.html)
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: j-users-help@xerces.apache.org

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

Re: Using grammarpool with included schemas

Posted by Dick Deneer <di...@donkeydevelopment.com>.

You may be theorically right, but it is obvious that it would be very  
practical to have it available together with the systemid in the  
entityResolver.  Returning a schema with another namespace is just  
useless.

And in continuing about my question if the parser will resolve the  
entity by itself or not, I will suggest for another property  where  
you van set a kind of finalResolver with the same method as  
resolveEntity, that wil get a callback if the parser did not find the  
entity. Then you get a last chance to resolve it yourself.


Op 7-jul-2006, om 20:58 heeft Joseph Kesselman het volgende geschreven:

> A namespace name, although it is expressed as a URI, is just a  
> name. Normal
> XML processing never never attempt to retrieve anything from it, so  
> it is
> never processed by the EntityResolver.
>
> (The Semantic Web group may eventually define what, if anything,  
> might be
> accessable through the namespace URI. But for now, treat it just as a
> string in URI syntax.)
>
> ______________________________________
> "... Three things see no end: A loop with exit code done wrong,
> A semaphore untested, And the change that comes along. ..."
>   -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish
> (http://www.ovff.org/pegasus/songs/threes-rev-11.html)
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

Re: Using grammarpool with included schemas

Posted by Joseph Kesselman <ke...@us.ibm.com>.

A namespace name, although it is expressed as a URI, is just a name. Normal
XML processing never never attempt to retrieve anything from it, so it is
never processed by the EntityResolver.

(The Semantic Web group may eventually define what, if anything, might be
accessable through the namespace URI. But for now, treat it just as a
string in URI syntax.)

______________________________________
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish
(http://www.ovff.org/pegasus/songs/threes-rev-11.html)


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

Re: Using grammarpool with included schemas

Posted by Dick Deneer <di...@donkeydevelopment.com>.

Thanks for the explanation.
I am realizing that my perspective towards  schema's and xml was wrong.
I allways took the approach where the schema was leading:  "this is  
my schema so come up with that xml and I will validate it".  This way  
I thought to have maximum control and making a simple solution.
But in fact I was swimming against the stream because the XML itself  
determines the way it should be handled. You can see this also with  
tools like XMLSpy which let you change the xmlsource to validate  
against another schema or DTD.

So I will take another approuch which is more in line with Xerces  .
Today I have tested a lot with the xerces SAXParser and an  
EnityResolver2.
And I have simple question.
I am putting in this XML:
<purchaseOrder orderDate="1999-10-20" xmlns="http://tempuri.org/po.xsd"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
xsi:schemaLocation="http://tempuri.org/po.xsd C:\Temp\Schemas\apo.xsd">

The entityresolver2 gets the following callbacks:

getExternalSubset (name purchaseOrder baseURI null)

resolveEntity name null publicId null baseURI null systemId C:\Temp 
\Schemas\apo.xsd

getExternalSubset  name xs:schema baseURI file:///C:/Temp/Schemas/ 
apo.xsd

So getExternalSubset is always called; the second time it comes form  
the loaded schema.
Also the baseURI is null in the first because I did don specify a  
systemid to the xml inputSource.
This is all clear to me.

But I wonder about values in the resolveEntity. I thought I also  
should have the namespace
http://tempuri.org/po.xsd  available  in one of the parameters (I  
supposed in name or publicid).
But I only get the systemID. It seams to me that the namespace is a  
very relevant value for determing the right inputsource you will give  
back.
Also the doc says about the name:
name - Identifies the external entity being resolved. Either "[dtd]"  
for the external subset, or a name starting with "%" to indicate a  
parameter entity, or else the name of a general entity. This is never  
null when invoked by a SAX2 parser.
Am I missing something?


And can I just open a URL connection to the given systemid to check  
if the parser will resolve the entry. Or should I combine this with  
the baseUri if not null?



Op 7-jul-2006, om 7:12 heeft Michael Glavassevich het volgende  
geschreven:

> Hi Dick,
>
> An XSGrammar is a collection of schema components for a given target
> namespace. The schema validator will consult the grammar pool once per
> namespace, so you only have one shot to return an XSGrammar object  
> from
> your pool for each namespace. That doesn't prevent your grammar pool
> implementation from doing something clever like merging XSGrammars  
> (see
> org.apache.xerces.impl.xs.XSLoaderImpl.XSGrammarMerger [1]) but  
> that may
> be a bit difficult for you to do particularly if the schemas aren't
> disjoint. I would go the entity resolver route instead of trying  
> that. You
> can register an XMLEntityResolver [2] with the XMLGrammarPreparser by
> calling the setEntityResolver(XMLEntityResolver) method. The parser's
> default behaviour is to open a URLConnection for the location  
> specified.
> If you're unable to create an InputStream from the URLConnection in  
> your
> entity resolver the parser won't succeed at doing that either.
>
> Thanks.
>
> [1]
> http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/ 
> xerces/impl/xs/XSLoaderImpl.java?revision=406145&view=markup
> [2]
> http://xerces.apache.org/xerces2-j/javadocs/xni/org/apache/xerces/ 
> xni/parser/XMLEntityResolver.html
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
>
> Dick Deneer <di...@donkeydevelopment.com> wrote on 07/04/2006
> 01:59:17 PM:
>
>> Hi,
>>
>> I am using a grammarpool to cache schemas and want the user to point
>> to a number of schemas that will be added to the grammarpool. Then
>> the grammarpool is used to validate a XML instance document.
>> One of the schema (=top )  uses a include to another schema (=sub
>> ).  If I can  rely on the location mentioned in the include there is
>> no problem. Just add  "top" to the pool and everything is fine.
>> But when the path in the top schema is invalid, the included schema
>> cannot be found. My first idea was just let the user add the
>> subschema also to the pool. But this has no effect.  The result is
>> that the subschema is just ignored and will not be in the
>> grammarpool (likely caused by the fact that there is already a
>> grammar with the same namespace in the pool, namely "top").
>> Validation will be incomplete because the type definitions mentioned
>> in sub are ignored.
>>
>> Working in the other direction (first add the sub, then the top
>> schema) does not not work either: only the sub schema will be added
>> to the grammarpool
>>
>> The ideal situation for me would be: just add another schema to the
>> pool if the parser gives an error that something cannot be found.
>>
>> Is there no other way then using a entityresolver?
>> If so: can I detect in the entityresolver that the schema will or
>> will not be found by the parser.if "not" , I have to return my own
>> inputsource.
>> Can I also use such a entityresolver when using a  
>> XMLGrammarPreparser?
>>
>> PS
>> You can easily "replay" the above situations with
> the XMLGrammarBuilder.java
>> program that is supplied with xerces. But also using direct (without
>> preparsing) the dom or sax parser with a grammarpool will give the
>> same results.
>> I used these xml and schemas
>>
>> top.xsd:
>> <?xml version="1.0" encoding="UTF-8"?>
>> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
>> <xsd:include schemaLocation="include.xsd"/>
>> <xsd:element name="myRoot" type="myRootType"/>
>> <xsd:complexType name="myRootType">
>> <xsd:sequence>
>> <xsd:element name="label" type="labelType"/>
>> </xsd:sequence>
>> </xsd:complexType>
>> </xsd:schema>
>>
>> include.xsd:
>> <?xml version="1.0" encoding="UTF-8"?>
>> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
>> <xsd:simpleType name="labelType">
>> <xsd:restriction base="xsd:string">
>> <xsd:enumeration value="01"/>
>> <xsd:enumeration value="02"/>
>> </xsd:restriction>
>> </xsd:simpleType>
>> </xsd:schema>
>>
>> instance.xml:
>> <?xml version="1.0" encoding="UTF-8"?>
>> <myRoot>
>>     <label>028</label>
>> </myRoot>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
>
>
>
>

Re: Using grammarpool with included schemas

Posted by Michael Glavassevich <mr...@ca.ibm.com>.

Hi Dick,

An XSGrammar is a collection of schema components for a given target 
namespace. The schema validator will consult the grammar pool once per 
namespace, so you only have one shot to return an XSGrammar object from 
your pool for each namespace. That doesn't prevent your grammar pool 
implementation from doing something clever like merging XSGrammars (see 
org.apache.xerces.impl.xs.XSLoaderImpl.XSGrammarMerger [1]) but that may 
be a bit difficult for you to do particularly if the schemas aren't 
disjoint. I would go the entity resolver route instead of trying that. You 
can register an XMLEntityResolver [2] with the XMLGrammarPreparser by 
calling the setEntityResolver(XMLEntityResolver) method. The parser's 
default behaviour is to open a URLConnection for the location specified. 
If you're unable to create an InputStream from the URLConnection in your 
entity resolver the parser won't succeed at doing that either.

Thanks.

[1] 
http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/impl/xs/XSLoaderImpl.java?revision=406145&view=markup
[2] 
http://xerces.apache.org/xerces2-j/javadocs/xni/org/apache/xerces/xni/parser/XMLEntityResolver.html

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Dick Deneer <di...@donkeydevelopment.com> wrote on 07/04/2006 
01:59:17 PM:

> Hi,
> 
> I am using a grammarpool to cache schemas and want the user to point
> to a number of schemas that will be added to the grammarpool. Then 
> the grammarpool is used to validate a XML instance document. 
> One of the schema (=top )  uses a include to another schema (=sub 
> ).  If I can  rely on the location mentioned in the include there is
> no problem. Just add  "top" to the pool and everything is fine. 
> But when the path in the top schema is invalid, the included schema 
> cannot be found. My first idea was just let the user add the 
> subschema also to the pool. But this has no effect.  The result is 
> that the subschema is just ignored and will not be in the 
> grammarpool (likely caused by the fact that there is already a 
> grammar with the same namespace in the pool, namely "top"). 
> Validation will be incomplete because the type definitions mentioned
> in sub are ignored.
> 
> Working in the other direction (first add the sub, then the top 
> schema) does not not work either: only the sub schema will be added 
> to the grammarpool
> 
> The ideal situation for me would be: just add another schema to the 
> pool if the parser gives an error that something cannot be found.
> 
> Is there no other way then using a entityresolver?
> If so: can I detect in the entityresolver that the schema will or 
> will not be found by the parser.if "not" , I have to return my own 
> inputsource.
> Can I also use such a entityresolver when using a XMLGrammarPreparser?
> 
> PS
> You can easily "replay" the above situations with  
the XMLGrammarBuilder.java 
> program that is supplied with xerces. But also using direct (without
> preparsing) the dom or sax parser with a grammarpool will give the 
> same results.
> I used these xml and schemas
> 
> top.xsd:
> <?xml version="1.0" encoding="UTF-8"?>
> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
> <xsd:include schemaLocation="include.xsd"/>
> <xsd:element name="myRoot" type="myRootType"/>
> <xsd:complexType name="myRootType">
> <xsd:sequence>
> <xsd:element name="label" type="labelType"/>
> </xsd:sequence>
> </xsd:complexType>
> </xsd:schema>
> 
> include.xsd:
> <?xml version="1.0" encoding="UTF-8"?>
> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
> <xsd:simpleType name="labelType">
> <xsd:restriction base="xsd:string">
> <xsd:enumeration value="01"/>
> <xsd:enumeration value="02"/>
> </xsd:restriction>
> </xsd:simpleType>
> </xsd:schema>
> 
> instance.xml:
> <?xml version="1.0" encoding="UTF-8"?>
> <myRoot>
>     <label>028</label>
> </myRoot>

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org