You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-users@xerces.apache.org by Andy Harris <an...@ubmatrix.com> on 2007/11/12 02:49:17 UTC

grammar pool population

Hello-

For performance reasons, I'm caching grammars for validation of an
instance against a set of connected and disconnected schemas of an XBRL
taxonomy. Schemas can be disconnected when discovered via linkbases. The
only way I can get this working is to re-generate grammar every time,
letting Xerces construct the pool before validating the instance. This
is very expensive to let xerces parse grammar more then once.

Searching the xerces-2 grammar FAQ and other mailing lists on the
internet has suggestions on how to "actively" construct grammar pools
(caching grammars), but I still get the following warning(s):

Warning:

One of the grammar(s) returned from the user's grammar pool is in
conflict with another grammar.

Here is my integration code for building a grammar pool from 2 or more
root schemas:

 while (schemas.hasNext()) {

IDTSNode schemaNode = (IDTSNode) schemas.next();

            XMLGrammarPoolImpl gp =
(XMLGrammarPoolImpl)schemaNode.getGrammarPool();

            Grammar[] g =
gp.retrieveInitialGrammarSet("http://www.w3.org/2001/XMLSchema");


 
grammarPool.cacheGrammars("http://www.w3.org/2001/XMLSchema", g);

}

A root schema does not have any referencing schemas and the set of
schemas are reverse topologically sorted. Inspection of the graph of
grammars within grammarPool reveals that they are the same as when
Xerces populates the grammar pool. Is this the correct way to cache
grammars into the grammar pool?

If there is only 1 root schema, there is no additional "merging" of
grammars and the grammar pool from the single root schema can be used
directly to validate an instance.

A grammar pool is set on a configuration instance as a feature and
passed to the constructor of DOMParser:

                                  DOMParser parser = new
DOMParser(config);

The grammarPool is locked before parsing and unlocked after parsing to
prevent further entity resolution. In addition to XML schema validation,
my application caches Grammars for PSVI analysis and XML prototyping.

Analysis of the grammarPool source code indicates that a warning is
generated because there must be two instances of Grammar with the same
target namespace. However, I don't see this when looking through the
grammarPool instance. My next step is to step through the Xerces-2
sources under my application, but was hoping somebody could indicate
improper usage on my part.

thanks,

-Andy

 


______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

RE: grammar pool population

Posted by Michael Glavassevich <mr...@ca.ibm.com>.

Hi Andy,

There's no way to tell which parts of a SchemaGrammar came from which
schema documents. If the schemas you're loading are consistent what you
could do is merge the components for a given target namespace into a single
SchemaGrammar object and eliminate (or replace) references to the other
ones. The XSLoader [1] (see the XSGrammarMerger inner class) will merge
together SchemaGrammar objects from the same namespace into one. Might give
you some ideas if you are interested in attempting this yourself (on a
larger scale).

Thanks.

[1]
http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/impl/xs/XSLoaderImpl.java?revision=449487&view=markup

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Andy Harris" <an...@ubmatrix.com> wrote on 11/16/2007 01:03:44 AM:

> Thank you for your response.
>
> Would it be possible to remove the grammar parts associated with an
> included schema? This would probably require an intelligent grammarpool
> implementation that partitions definitions in the grammar for partial
> schemas defined within the same Grammar by targetNamespace.
>
> -Andy
>
> -----Original Message-----
> From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
> Sent: Thursday, November 15, 2007 7:28 PM
> To: j-users@xerces.apache.org
> Cc: Andy Harris
> Subject: RE: grammar pool population
>
> Hi Andy,
>
> Each SchemaGrammar is a container for all the schema components of a
> given
> target namespace. Includes have no representation in the model (though
> you
> can query [1] the grammar for the list of document URIs which
> contributed
> to it). All of the schema components constructed from the includes will
> be
> stored in the same grammar object.
>
> Thanks.
>
> [1]
> http://xerces.apache.org/xerces2-j/javadocs/xs/org/apache/xerces/xs/XSNa
> mespaceItem.html#getDocumentLocations()
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
>
> "Andy Harris" <an...@ubmatrix.com> wrote on 11/12/2007 12:04:51
> AM:
>
> > Thanks for your response. How are includes represented in the grammar
> > pool? It would seem that included schemas - which all have the same
> > targetNamespace - would trigger these warnings.
> >
> > -----Original Message-----
> > From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
> > Sent: Sunday, November 11, 2007 7:16 PM
> > To: j-users@xerces.apache.org
> > Cc: Andy Harris
> > Subject: Re: grammar pool population
> >
> > Hi Andy,
> >
> > You're right. This warning is generated when a grammar pool returns
> > multiple grammar objects for the same target namespace. Note that this
> > includes imports (each SchemaGrammar has a list of them which the
> schema
> > validator's grammar bucket [1] extracts), so if you loaded several
> > grammars
> > independently with different grammar pools and they had imports for
> the
> > same namespace you might run into this. If these grammar objects are
> > truly
> > equal I suppose you could resolve the conflict by replacing the
> > duplicates
> > with a unique instance or avoid the issue altogether by using a single
> > grammar pool for all of the schemas.
> >
> > Thanks.
> >
> > [1]
> >
> http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/imp
> > l/xs/XSGrammarBucket.java?revision=446734&view=markup
> >
> > Michael Glavassevich
> > XML Parser Development
> > IBM Toronto Lab
> > E-mail: mrglavas@ca.ibm.com
> > E-mail: mrglavas@apache.org
> >
> > "Andy Harris" <an...@ubmatrix.com> wrote on 11/11/2007 08:49:17
> > PM:
> >
> > > Hello-
> > > For performance reasons, I?m caching grammars for validation of an
> > > instance against a set of connected and disconnected schemas of an
> > > XBRL taxonomy. Schemas can be disconnected when discovered via
> > > linkbases. The only way I can get this working is to re-generate
> > > grammar every time, letting Xerces construct the pool before
> > > validating the instance. This is very expensive to let xerces parse
> > > grammar more then once.
> > > Searching the xerces-2 grammar FAQ and other mailing lists on the
> > > internet has suggestions on how to "actively" construct grammar
> > > pools (caching grammars), but I still get the following warning(s):
> > > Warning:
> > > One of the grammar(s) returned from the user's grammar pool is in
> > > conflict with another grammar.
> > > Here is my integration code for building a grammar pool from 2 or
> > > more root schemas:
> > >  while (schemas.hasNext()) {
> > > IDTSNode schemaNode = (IDTSNode) schemas.next();
> > >             XMLGrammarPoolImpl gp = (XMLGrammarPoolImpl)schemaNode.
> > > getGrammarPool();
> > >             Grammar[] g = gp.retrieveInitialGrammarSet("http://www.
> > > w3.org/2001/XMLSchema");
> > >
> > grammarPool.cacheGrammars("http://www.w3.org/2001/XMLSchema",
> > g);
> > > }
> > > A root schema does not have any referencing schemas and the set of
> > > schemas are reverse topologically sorted. Inspection of the graph of
> > > grammars within grammarPool reveals that they are the same as when
> > > Xerces populates the grammar pool. Is this the correct way to cache
> > > grammars into the grammar pool?
> > > If there is only 1 root schema, there is no additional ?merging? of
> > > grammars and the grammar pool from the single root schema can be
> > > used directly to validate an instance.
> > > A grammar pool is set on a configuration instance as a feature and
> > > passed to the constructor of DOMParser:
> > >                                   DOMParser parser = new
> > DOMParser(config);
> > > The grammarPool is locked before parsing and unlocked after parsing
> > > to prevent further entity resolution. In addition to XML schema
> > > validation, my application caches Grammars for PSVI analysis and XML
> > > prototyping.
> > > Analysis of the grammarPool source code indicates that a warning is
> > > generated because there must be two instances of Grammar with the
> > > same target namespace. However, I don?t see this when looking
> > > through the grammarPool instance. My next step is to step through
> > > the Xerces-2 sources under my application, but was hoping somebody
> > > could indicate improper usage on my part.
> > > thanks,
> > > -Andy
> >
> >
> > ______________________________________________________________________
> > This email has been scanned by the MessageLabs Email Security System.
> > For more information please visit http://www.messagelabs.com/email
> > ______________________________________________________________________
> >
> > ______________________________________________________________________
> > This email has been scanned by the MessageLabs Email Security System.
> > For more information please visit http://www.messagelabs.com/email
> > ______________________________________________________________________
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: j-users-help@xerces.apache.org
>
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

RE: grammar pool population

Posted by Andy Harris <an...@ubmatrix.com>.

Thank you for your response.

Would it be possible to remove the grammar parts associated with an
included schema? This would probably require an intelligent grammarpool
implementation that partitions definitions in the grammar for partial
schemas defined within the same Grammar by targetNamespace.

-Andy

-----Original Message-----
From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com] 
Sent: Thursday, November 15, 2007 7:28 PM
To: j-users@xerces.apache.org
Cc: Andy Harris
Subject: RE: grammar pool population

Hi Andy,

Each SchemaGrammar is a container for all the schema components of a
given
target namespace. Includes have no representation in the model (though
you
can query [1] the grammar for the list of document URIs which
contributed
to it). All of the schema components constructed from the includes will
be
stored in the same grammar object.

Thanks.

[1]
http://xerces.apache.org/xerces2-j/javadocs/xs/org/apache/xerces/xs/XSNa
mespaceItem.html#getDocumentLocations()

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Andy Harris" <an...@ubmatrix.com> wrote on 11/12/2007 12:04:51
AM:

> Thanks for your response. How are includes represented in the grammar
> pool? It would seem that included schemas - which all have the same
> targetNamespace - would trigger these warnings.
>
> -----Original Message-----
> From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
> Sent: Sunday, November 11, 2007 7:16 PM
> To: j-users@xerces.apache.org
> Cc: Andy Harris
> Subject: Re: grammar pool population
>
> Hi Andy,
>
> You're right. This warning is generated when a grammar pool returns
> multiple grammar objects for the same target namespace. Note that this
> includes imports (each SchemaGrammar has a list of them which the
schema
> validator's grammar bucket [1] extracts), so if you loaded several
> grammars
> independently with different grammar pools and they had imports for
the
> same namespace you might run into this. If these grammar objects are
> truly
> equal I suppose you could resolve the conflict by replacing the
> duplicates
> with a unique instance or avoid the issue altogether by using a single
> grammar pool for all of the schemas.
>
> Thanks.
>
> [1]
>
http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/imp
> l/xs/XSGrammarBucket.java?revision=446734&view=markup
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
>
> "Andy Harris" <an...@ubmatrix.com> wrote on 11/11/2007 08:49:17
> PM:
>
> > Hello-
> > For performance reasons, I?m caching grammars for validation of an
> > instance against a set of connected and disconnected schemas of an
> > XBRL taxonomy. Schemas can be disconnected when discovered via
> > linkbases. The only way I can get this working is to re-generate
> > grammar every time, letting Xerces construct the pool before
> > validating the instance. This is very expensive to let xerces parse
> > grammar more then once.
> > Searching the xerces-2 grammar FAQ and other mailing lists on the
> > internet has suggestions on how to "actively" construct grammar
> > pools (caching grammars), but I still get the following warning(s):
> > Warning:
> > One of the grammar(s) returned from the user's grammar pool is in
> > conflict with another grammar.
> > Here is my integration code for building a grammar pool from 2 or
> > more root schemas:
> >  while (schemas.hasNext()) {
> > IDTSNode schemaNode = (IDTSNode) schemas.next();
> >             XMLGrammarPoolImpl gp = (XMLGrammarPoolImpl)schemaNode.
> > getGrammarPool();
> >             Grammar[] g = gp.retrieveInitialGrammarSet("http://www.
> > w3.org/2001/XMLSchema");
> >
> grammarPool.cacheGrammars("http://www.w3.org/2001/XMLSchema",
> g);
> > }
> > A root schema does not have any referencing schemas and the set of
> > schemas are reverse topologically sorted. Inspection of the graph of
> > grammars within grammarPool reveals that they are the same as when
> > Xerces populates the grammar pool. Is this the correct way to cache
> > grammars into the grammar pool?
> > If there is only 1 root schema, there is no additional ?merging? of
> > grammars and the grammar pool from the single root schema can be
> > used directly to validate an instance.
> > A grammar pool is set on a configuration instance as a feature and
> > passed to the constructor of DOMParser:
> >                                   DOMParser parser = new
> DOMParser(config);
> > The grammarPool is locked before parsing and unlocked after parsing
> > to prevent further entity resolution. In addition to XML schema
> > validation, my application caches Grammars for PSVI analysis and XML
> > prototyping.
> > Analysis of the grammarPool source code indicates that a warning is
> > generated because there must be two instances of Grammar with the
> > same target namespace. However, I don?t see this when looking
> > through the grammarPool instance. My next step is to step through
> > the Xerces-2 sources under my application, but was hoping somebody
> > could indicate improper usage on my part.
> > thanks,
> > -Andy
>
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org


______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

RE: grammar pool population

Posted by Michael Glavassevich <mr...@ca.ibm.com>.

Hi Andy,

Each SchemaGrammar is a container for all the schema components of a given
target namespace. Includes have no representation in the model (though you
can query [1] the grammar for the list of document URIs which contributed
to it). All of the schema components constructed from the includes will be
stored in the same grammar object.

Thanks.

[1]
http://xerces.apache.org/xerces2-j/javadocs/xs/org/apache/xerces/xs/XSNamespaceItem.html#getDocumentLocations()

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Andy Harris" <an...@ubmatrix.com> wrote on 11/12/2007 12:04:51 AM:

> Thanks for your response. How are includes represented in the grammar
> pool? It would seem that included schemas - which all have the same
> targetNamespace - would trigger these warnings.
>
> -----Original Message-----
> From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
> Sent: Sunday, November 11, 2007 7:16 PM
> To: j-users@xerces.apache.org
> Cc: Andy Harris
> Subject: Re: grammar pool population
>
> Hi Andy,
>
> You're right. This warning is generated when a grammar pool returns
> multiple grammar objects for the same target namespace. Note that this
> includes imports (each SchemaGrammar has a list of them which the schema
> validator's grammar bucket [1] extracts), so if you loaded several
> grammars
> independently with different grammar pools and they had imports for the
> same namespace you might run into this. If these grammar objects are
> truly
> equal I suppose you could resolve the conflict by replacing the
> duplicates
> with a unique instance or avoid the issue altogether by using a single
> grammar pool for all of the schemas.
>
> Thanks.
>
> [1]
> http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/imp
> l/xs/XSGrammarBucket.java?revision=446734&view=markup
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
>
> "Andy Harris" <an...@ubmatrix.com> wrote on 11/11/2007 08:49:17
> PM:
>
> > Hello-
> > For performance reasons, I?m caching grammars for validation of an
> > instance against a set of connected and disconnected schemas of an
> > XBRL taxonomy. Schemas can be disconnected when discovered via
> > linkbases. The only way I can get this working is to re-generate
> > grammar every time, letting Xerces construct the pool before
> > validating the instance. This is very expensive to let xerces parse
> > grammar more then once.
> > Searching the xerces-2 grammar FAQ and other mailing lists on the
> > internet has suggestions on how to "actively" construct grammar
> > pools (caching grammars), but I still get the following warning(s):
> > Warning:
> > One of the grammar(s) returned from the user's grammar pool is in
> > conflict with another grammar.
> > Here is my integration code for building a grammar pool from 2 or
> > more root schemas:
> >  while (schemas.hasNext()) {
> > IDTSNode schemaNode = (IDTSNode) schemas.next();
> >             XMLGrammarPoolImpl gp = (XMLGrammarPoolImpl)schemaNode.
> > getGrammarPool();
> >             Grammar[] g = gp.retrieveInitialGrammarSet("http://www.
> > w3.org/2001/XMLSchema");
> >
> grammarPool.cacheGrammars("http://www.w3.org/2001/XMLSchema",
> g);
> > }
> > A root schema does not have any referencing schemas and the set of
> > schemas are reverse topologically sorted. Inspection of the graph of
> > grammars within grammarPool reveals that they are the same as when
> > Xerces populates the grammar pool. Is this the correct way to cache
> > grammars into the grammar pool?
> > If there is only 1 root schema, there is no additional ?merging? of
> > grammars and the grammar pool from the single root schema can be
> > used directly to validate an instance.
> > A grammar pool is set on a configuration instance as a feature and
> > passed to the constructor of DOMParser:
> >                                   DOMParser parser = new
> DOMParser(config);
> > The grammarPool is locked before parsing and unlocked after parsing
> > to prevent further entity resolution. In addition to XML schema
> > validation, my application caches Grammars for PSVI analysis and XML
> > prototyping.
> > Analysis of the grammarPool source code indicates that a warning is
> > generated because there must be two instances of Grammar with the
> > same target namespace. However, I don?t see this when looking
> > through the grammarPool instance. My next step is to step through
> > the Xerces-2 sources under my application, but was hoping somebody
> > could indicate improper usage on my part.
> > thanks,
> > -Andy
>
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

RE: grammar pool population

Posted by Andy Harris <an...@ubmatrix.com>.

Thanks for your response. How are includes represented in the grammar
pool? It would seem that included schemas - which all have the same
targetNamespace - would trigger these warnings.

-----Original Message-----
From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com] 
Sent: Sunday, November 11, 2007 7:16 PM
To: j-users@xerces.apache.org
Cc: Andy Harris
Subject: Re: grammar pool population

Hi Andy,

You're right. This warning is generated when a grammar pool returns
multiple grammar objects for the same target namespace. Note that this
includes imports (each SchemaGrammar has a list of them which the schema
validator's grammar bucket [1] extracts), so if you loaded several
grammars
independently with different grammar pools and they had imports for the
same namespace you might run into this. If these grammar objects are
truly
equal I suppose you could resolve the conflict by replacing the
duplicates
with a unique instance or avoid the issue altogether by using a single
grammar pool for all of the schemas.

Thanks.

[1]
http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/imp
l/xs/XSGrammarBucket.java?revision=446734&view=markup

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Andy Harris" <an...@ubmatrix.com> wrote on 11/11/2007 08:49:17
PM:

> Hello-
> For performance reasons, I?m caching grammars for validation of an
> instance against a set of connected and disconnected schemas of an
> XBRL taxonomy. Schemas can be disconnected when discovered via
> linkbases. The only way I can get this working is to re-generate
> grammar every time, letting Xerces construct the pool before
> validating the instance. This is very expensive to let xerces parse
> grammar more then once.
> Searching the xerces-2 grammar FAQ and other mailing lists on the
> internet has suggestions on how to "actively" construct grammar
> pools (caching grammars), but I still get the following warning(s):
> Warning:
> One of the grammar(s) returned from the user's grammar pool is in
> conflict with another grammar.
> Here is my integration code for building a grammar pool from 2 or
> more root schemas:
>  while (schemas.hasNext()) {
> IDTSNode schemaNode = (IDTSNode) schemas.next();
>             XMLGrammarPoolImpl gp = (XMLGrammarPoolImpl)schemaNode.
> getGrammarPool();
>             Grammar[] g = gp.retrieveInitialGrammarSet("http://www.
> w3.org/2001/XMLSchema");
>
grammarPool.cacheGrammars("http://www.w3.org/2001/XMLSchema",
g);
> }
> A root schema does not have any referencing schemas and the set of
> schemas are reverse topologically sorted. Inspection of the graph of
> grammars within grammarPool reveals that they are the same as when
> Xerces populates the grammar pool. Is this the correct way to cache
> grammars into the grammar pool?
> If there is only 1 root schema, there is no additional ?merging? of
> grammars and the grammar pool from the single root schema can be
> used directly to validate an instance.
> A grammar pool is set on a configuration instance as a feature and
> passed to the constructor of DOMParser:
>                                   DOMParser parser = new
DOMParser(config);
> The grammarPool is locked before parsing and unlocked after parsing
> to prevent further entity resolution. In addition to XML schema
> validation, my application caches Grammars for PSVI analysis and XML
> prototyping.
> Analysis of the grammarPool source code indicates that a warning is
> generated because there must be two instances of Grammar with the
> same target namespace. However, I don?t see this when looking
> through the grammarPool instance. My next step is to step through
> the Xerces-2 sources under my application, but was hoping somebody
> could indicate improper usage on my part.
> thanks,
> -Andy


______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

Re: grammar pool population

Posted by Michael Glavassevich <mr...@ca.ibm.com>.

Hi Andy,

You're right. This warning is generated when a grammar pool returns
multiple grammar objects for the same target namespace. Note that this
includes imports (each SchemaGrammar has a list of them which the schema
validator's grammar bucket [1] extracts), so if you loaded several grammars
independently with different grammar pools and they had imports for the
same namespace you might run into this. If these grammar objects are truly
equal I suppose you could resolve the conflict by replacing the duplicates
with a unique instance or avoid the issue altogether by using a single
grammar pool for all of the schemas.

Thanks.

[1]
http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/impl/xs/XSGrammarBucket.java?revision=446734&view=markup

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Andy Harris" <an...@ubmatrix.com> wrote on 11/11/2007 08:49:17 PM:

> Hello-
> For performance reasons, I?m caching grammars for validation of an
> instance against a set of connected and disconnected schemas of an
> XBRL taxonomy. Schemas can be disconnected when discovered via
> linkbases. The only way I can get this working is to re-generate
> grammar every time, letting Xerces construct the pool before
> validating the instance. This is very expensive to let xerces parse
> grammar more then once.
> Searching the xerces-2 grammar FAQ and other mailing lists on the
> internet has suggestions on how to "actively" construct grammar
> pools (caching grammars), but I still get the following warning(s):
> Warning:
> One of the grammar(s) returned from the user's grammar pool is in
> conflict with another grammar.
> Here is my integration code for building a grammar pool from 2 or
> more root schemas:
>  while (schemas.hasNext()) {
> IDTSNode schemaNode = (IDTSNode) schemas.next();
>             XMLGrammarPoolImpl gp = (XMLGrammarPoolImpl)schemaNode.
> getGrammarPool();
>             Grammar[] g = gp.retrieveInitialGrammarSet("http://www.
> w3.org/2001/XMLSchema");
>             grammarPool.cacheGrammars("http://www.w3.org/2001/XMLSchema",
g);
> }
> A root schema does not have any referencing schemas and the set of
> schemas are reverse topologically sorted. Inspection of the graph of
> grammars within grammarPool reveals that they are the same as when
> Xerces populates the grammar pool. Is this the correct way to cache
> grammars into the grammar pool?
> If there is only 1 root schema, there is no additional ?merging? of
> grammars and the grammar pool from the single root schema can be
> used directly to validate an instance.
> A grammar pool is set on a configuration instance as a feature and
> passed to the constructor of DOMParser:
>                                   DOMParser parser = new
DOMParser(config);
> The grammarPool is locked before parsing and unlocked after parsing
> to prevent further entity resolution. In addition to XML schema
> validation, my application caches Grammars for PSVI analysis and XML
> prototyping.
> Analysis of the grammarPool source code indicates that a warning is
> generated because there must be two instances of Grammar with the
> same target namespace. However, I don?t see this when looking
> through the grammarPool instance. My next step is to step through
> the Xerces-2 sources under my application, but was hoping somebody
> could indicate improper usage on my part.
> thanks,
> -Andy


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org