You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Daniel Jackson <af...@gmail.com> on 2008/04/24 18:39:43 UTC

XMLGrammarPool issues

I'm trying to implement some sort of caching in a system I'm writing using
the XMLGrammarPool.
What I tried so far was to create a GrammarResolver, extract it's
XMLGrammarPool and give that to the parsers I create (DOM and SAX). I
configure the parser to cache schemas during parsing so the pool gets filled
when my parser resolves external entities.
I still have several open issues though:
1. When the parser resolves 2 schemas with the same namespace, how are those
stored inside the pool? I peeked at the implementation and noticed it uses
the target namespace as it's key.
2. Is there a way to retrieve a schema from the pool using the system id
that was given to it when it was being resolved?
3. I want the achieve thread-safety when attaching the pool to a parser and
adding grammars to it. I saw that the pool provides 2 methods, lock and
unlock although I believe that those are not the ones I'm after. After
trying to use them between calls to loadGrammar I saw that it simply stops
adding grammars to the pool.

Thanks
-- 
View this message in context: http://www.nabble.com/XMLGrammarPool-issues-tp16851037p16851037.html
Sent from the Xerces - C - Dev mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: XMLGrammarPool issues

Posted by Daniel Jackson <af...@gmail.com>.
Thank you both for your answers, I believe I have everything I need to make
this work. Thanks again!
-- 
View this message in context: http://www.nabble.com/XMLGrammarPool-issues-tp16851037p16904443.html
Sent from the Xerces - C - Dev mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: XMLGrammarPool issues

Posted by Boris Kolpackov <bo...@codesynthesis.com>.
Hi Daniel,

Daniel Jackson <af...@gmail.com> writes:

> Can you write a small piece of code explaining exactly what you meant?

Sorry, I don't think I will have time for this any time soon.

Boris

-- 
Boris Kolpackov, Code Synthesis Tools   http://codesynthesis.com/~boris/blog
Open source XML data binding for C++:   http://codesynthesis.com/products/xsd
Mobile/embedded validating XML parsing: http://codesynthesis.com/products/xsde

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: XMLGrammarPool issues

Posted by Daniel Jackson <af...@gmail.com>.

Boris Kolpackov-2 wrote:
> 
> Hi Daniel,
> 
> That would be the easiest way. Another alternative would be to have an
> upgradable rw mutex corresponding to the XMLGrammarPool instance. When
> starting the parsing process in a thread you would read-lock this mutex.
> If the entity resolver is called during parsing to resolve the schema
> (that is, a cache miss) then you would relock the mutex to write-lock
> before returning the schema. This will ensure that the parser invocation
> which adds the schema to the pool uses the pool exclusively. Then when
> the parser returns you release the lock. I think this will work.
> 

I tried messing with this today and got a bit lost with the mutexes, it
works but I can't be sure that it'll work in every scenario. Can you write a
small piece of code explaining exactly what you meant?

Thanks
-- 
View this message in context: http://www.nabble.com/XMLGrammarPool-issues-tp16851037p16910504.html
Sent from the Xerces - C - Dev mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: XMLGrammarPool issues

Posted by Boris Kolpackov <bo...@codesynthesis.com>.
Hi Daniel,

Daniel Jackson <af...@gmail.com> writes:

> Ok, I think I got it. I'm not sure how I would use this in my system though.
> Basically I have something like 100 schemas that the XMLs I validate might
> need. So if I understand what you're saying correctly, before starting to
> actually parse the XMLs I should load all those schemas into a
> XMLGrammarPool and only when that loading is completed I should start the
> parsing.

That would be the easiest way. Another alternative would be to have an
upgradable rw mutex corresponding to the XMLGrammarPool instance. When
starting the parsing process in a thread you would read-lock this mutex.
If the entity resolver is called during parsing to resolve the schema
(that is, a cache miss) then you would relock the mutex to write-lock
before returning the schema. This will ensure that the parser invocation
which adds the schema to the pool uses the pool exclusively. Then when
the parser returns you release the lock. I think this will work.

Boris

-- 
Boris Kolpackov, Code Synthesis Tools   http://codesynthesis.com/~boris/blog
Open source XML data binding for C++:   http://codesynthesis.com/products/xsd
Mobile/embedded validating XML parsing: http://codesynthesis.com/products/xsde

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: XMLGrammarPool issues

Posted by David Bertoni <db...@apache.org>.
Daniel Jackson wrote:
> 
> 
> David Bertoni wrote:
>> You can't store two different schemas with the same target namespace URI 
>> in the same grammar pool.  As you've noticed, this is because the pool 
>> uses the target namespace URI as the key.  It works this way because 
>> that's how the parser finds a grammar for a particular element during 
>> validation -- it uses the namespace URI of the element to validate.
>>
> 
> I see. So if I have a scenario as follows: I want to validate a XML file
> with schema A. Schema A includes schema B (both have the same namespace).
> When using one of the parsers and calling loadGrammar to load schema A, the
> parser will try to resolve schema B but will not cache it? Where does it go
> then?
As long as schema A includes B, all of the components defined in B will 
be available.  What won't work is if schema A doesn't include schema B, 
but you need components in both schemas, so you attempt to load both 
grammars separately.

> And if afterwards I try to use that XMLGrammarPool with a different parser
> will the same XML pass schema validation? Or will it fail because it can't
> find schema B?
As long as one of the schema documents includes the other, you're fine. 
  If that's not the case, the easiest thing to do is to create a single 
top-level schema document that just includes both A and B.

Does that clarify things?

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: XMLGrammarPool issues

Posted by Daniel Jackson <af...@gmail.com>.


David Bertoni wrote:
> 
> You can't store two different schemas with the same target namespace URI 
> in the same grammar pool.  As you've noticed, this is because the pool 
> uses the target namespace URI as the key.  It works this way because 
> that's how the parser finds a grammar for a particular element during 
> validation -- it uses the namespace URI of the element to validate.
> 

I see. So if I have a scenario as follows: I want to validate a XML file
with schema A. Schema A includes schema B (both have the same namespace).
When using one of the parsers and calling loadGrammar to load schema A, the
parser will try to resolve schema B but will not cache it? Where does it go
then?
And if afterwards I try to use that XMLGrammarPool with a different parser
will the same XML pass schema validation? Or will it fail because it can't
find schema B?


David Bertoni wrote:
> 
> lock() allows for thread-safe reads, but not writes, because it prevents 
> adding new grammars and implements a thread-safe XMLStringPool. 
> However, you need to be very careful with using lock() and unlock() 
> since these member functions are not thread-safe themselves.  The use 
> case is to add a bunch of grammars, lock it to ensure the pool is not 
> mutated, then allow multiple threads to have read-only access to it.  In 
> particular, you should never lock or unlock a grammar pool after you've 
> given it to a parser instance.
> 
> If you really want a fully thread-safe grammar pool, you'll have to wrap 
> the existing pool and use a mutex to synchronize the public member 
> functions.
> 

Ok, I think I got it. I'm not sure how I would use this in my system though.
Basically I have something like 100 schemas that the XMLs I validate might
need. So if I understand what you're saying correctly, before starting to
actually parse the XMLs I should load all those schemas into a
XMLGrammarPool and only when that loading is completed I should start the
parsing.

-- 
View this message in context: http://www.nabble.com/XMLGrammarPool-issues-tp16851037p16892292.html
Sent from the Xerces - C - Dev mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: XMLGrammarPool issues

Posted by David Bertoni <db...@apache.org>.
Daniel Jackson wrote:
> I'm trying to implement some sort of caching in a system I'm writing using
> the XMLGrammarPool.
> What I tried so far was to create a GrammarResolver, extract it's
> XMLGrammarPool and give that to the parsers I create (DOM and SAX). I
> configure the parser to cache schemas during parsing so the pool gets filled
> when my parser resolves external entities.
> I still have several open issues though:
> 1. When the parser resolves 2 schemas with the same namespace, how are those
> stored inside the pool? I peeked at the implementation and noticed it uses
> the target namespace as it's key.
You can't store two different schemas with the same target namespace URI 
in the same grammar pool.  As you've noticed, this is because the pool 
uses the target namespace URI as the key.  It works this way because 
that's how the parser finds a grammar for a particular element during 
validation -- it uses the namespace URI of the element to validate.

> 2. Is there a way to retrieve a schema from the pool using the system id
> that was given to it when it was being resolved?
No.

> 3. I want the achieve thread-safety when attaching the pool to a parser and
> adding grammars to it. I saw that the pool provides 2 methods, lock and
> unlock although I believe that those are not the ones I'm after. After
> trying to use them between calls to loadGrammar I saw that it simply stops
> adding grammars to the pool.
lock() allows for thread-safe reads, but not writes, because it prevents 
adding new grammars and implements a thread-safe XMLStringPool. 
However, you need to be very careful with using lock() and unlock() 
since these member functions are not thread-safe themselves.  The use 
case is to add a bunch of grammars, lock it to ensure the pool is not 
mutated, then allow multiple threads to have read-only access to it.  In 
particular, you should never lock or unlock a grammar pool after you've 
given it to a parser instance.

If you really want a fully thread-safe grammar pool, you'll have to wrap 
the existing pool and use a mutex to synchronize the public member 
functions.

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org