You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-dev@xerces.apache.org by Mikael Helbo Kjær <mh...@dia.dk> on 2000/07/25 09:10:50 UTC

Re: Ideas for Xerces redesign

Hi ya all.
Hmm other ideas, since you ask Andy ;)

Well the GrammarCache should just be an associated class (possibly a static
class which has been written in a threadsafe manner) of the
GrammarValidation (using a basic DTDValidator or a SchemaValidator) class.
Making this class static will allow multiple sources to draw from it, the
Validator could look up in it, as could the Application, the Validator could
store (externalize to disk) the DTD/Schema in it, as should the application
(thereby reducing the overhead of looking up the DTD/Schema on the
Inter/Intranet to just verifying that a new version isn`t available). Maybe
you could even leave the gathering of the Grammar from the Internet up to
the GrammarCache, so that the Validator only ask for and recieves the
Grammar from the cache or net, but recieves it no matter what.

Following my(and your) thought further the same thing (caching/looking
up/gathering) could also be done in the XInclude and the EntityResolver. 
Notice that this would all allow for several small "modules" like Validation
(and the cache), EntityResolver, XInclude and so on and so forth (as I
mentioned the first time). 

As much talk is about the XMLString and the performance optimizations in
JDK1.1 (ancient history) versus JDK1.3 (the release you should be aiming
for, as it will be supported on all relevant platforms except maybe some
obscure releases of UNIXes(as if that is relevant), old Macs (who cares) and
FreeBSD(this could be a problem)) I would like to ask if there is any
performance to be gained anymore by making pools of objects (in this case of
strings) which would function as Factories (using three simple methods: void
setup(int noOfObjects), XMLString getString() and void freeString(XMLString
theprodigalsonreturnth)). This an open question for you gurus out there (as
there is no need for a thread pool I won`t suggest it).

Mikael Helbo Kjær
Software Developer @ DIA a/s

oh, wait a minute. One more thing. Some time ago I made a small test of
parsers (Oracle XMLParser, AElfred, Xerces and so on) and I am doing it
again soon (this time on a Windows 2000 running the JDK1.3 client and
hotspot server 2.0), so any one with suggestions for the tests should email
me within this and next month (I`m mired in work so it won`t be finished any
time soon).

Re: Ideas for Xerces redesign

Posted by Eric Ye <er...@locus.apache.org>.

This discussion on grammar caching really lead to a question of what will
the typical use cases of XML parsing and validating. Here are some I can
think of:

1. A parser object is created to parse ONE XML file (or stream) and simply
check the wellformness, and then was throw away. Single thread

2. A parser object is created to parse many XML files and check the well
formness. Single thread.

3. A pool of parser object are created to parse many XML files, every parser
object works in its own thread has its own SymbolTable. Multiple thread

4. A pool of parser object are created to parse many XML files, every parser
object works in its own thread, all parser objects share ONE SymbolTable.
Multiple thread.

5. 1, 2, 3, 4  with validation, and also sharing of GrammaPool on top of
SymbolTable. Another factor could be client side use or Server side use.

Anything else?

We may need to take into consideration of all these use cases and how often
they are deployed in applications when we design Xerces2.

Eric Ye * IBM, JTC - Silicon Valley * ericye@locus.apache.org

----- Original Message -----
From: "Andy Clark" <an...@apache.org>
To: <xe...@xml.apache.org>
Sent: Tuesday, July 25, 2000 11:06 AM
Subject: Re: Ideas for Xerces redesign


> Mikael Helbo Kjær wrote:
> > Well the GrammarCache should just be an associated class (possibly
> > a static class which has been written in a threadsafe manner) of the
>
> I was thinking about this one. In the cases where separate parsers
> are only used and there is no shared grammar cache, then we don't
> need to think about thread safety. The question you always have to
> ask is whether everyone should pay the price of synchronization
> for the few people that need it.
>
> Instead of synchronizing access to the grammar cache by default,
> we could provide a SynchronizedGrammarCache that would simply
> synchronize the access for when people were sharing a grammar
> cache among parsers.
>
> I have a CachingParserPool class that allows a number of parsers
> to be constructed that share the same symbol table and grammar
> cache. I worked up an experimental implementation and I made
> synchronized wrappers of the symbol table and grammar pool so
> that a standalone parser wouldn't pay that penalty.
>
> > I would like to ask if there is any performance to be gained
> > anymore by making pools of objects (in this case of strings)
>
> The symbol table is not really a string factory. It's primary
> purpose is to ensure that symbol String objects are the same
> reference so that we perform reference compares on element and
> attribute names.
>
> Unlike in the old code where we could defer transcoding of
> the underlying encoded bytes until they were actually needed,
> we're assuming that it's always transcoded in the new design.
> This greatly simplifies the code -- more than anything else,
> actually.
>
> So anything that is not a symbol would just be passed as an
> array of characters. If the application wants to use it as
> a string, it would be responsible for creating a string from
> the characters.
>
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>
>

RE: Ideas for Xerces redesign

Posted by Ed Staub <es...@mediaone.net>.

Andy Clark wrote:
>I was thinking about this one. In the cases where separate parsers
>are only used and there is no shared grammar cache, then we don't
>need to think about thread safety. The question you always have to
>ask is whether everyone should pay the price of synchronization
>for the few people that need it.

>Instead of synchronizing access to the grammar cache by default,
>we could provide a SynchronizedGrammarCache that would simply
>synchronize the access for when people were sharing a grammar
>cache among parsers.

An alternative approach would be to "clone on demand" when a second thread
requests a grammar which is already in use.  The cache would need to be able
to track multiple copies of grammars.

If cloning is thread-safe, then this approach would only require
synchronization for initial acquisition of a grammar from cache and its
subsequent release; most calls could be unsynchronized.  It also would
remove any contention bottlenecking, especially in multiprocessor
configurations.

For best results, the clients would need to release grammars when they are
no longer needed.

Is this useful?

-Ed Staub

Re: Ideas for Xerces redesign

Posted by Andy Clark <an...@apache.org>.

Mikael Helbo Kjær wrote:
> Well the GrammarCache should just be an associated class (possibly 
> a static class which has been written in a threadsafe manner) of the

I was thinking about this one. In the cases where separate parsers
are only used and there is no shared grammar cache, then we don't
need to think about thread safety. The question you always have to
ask is whether everyone should pay the price of synchronization
for the few people that need it.

Instead of synchronizing access to the grammar cache by default,
we could provide a SynchronizedGrammarCache that would simply
synchronize the access for when people were sharing a grammar
cache among parsers.

I have a CachingParserPool class that allows a number of parsers
to be constructed that share the same symbol table and grammar
cache. I worked up an experimental implementation and I made
synchronized wrappers of the symbol table and grammar pool so
that a standalone parser wouldn't pay that penalty.

> I would like to ask if there is any performance to be gained 
> anymore by making pools of objects (in this case of strings) 

The symbol table is not really a string factory. It's primary
purpose is to ensure that symbol String objects are the same
reference so that we perform reference compares on element and
attribute names.

Unlike in the old code where we could defer transcoding of
the underlying encoded bytes until they were actually needed,
we're assuming that it's always transcoded in the new design. 
This greatly simplifies the code -- more than anything else, 
actually.

So anything that is not a symbol would just be passed as an
array of characters. If the application wants to use it as
a string, it would be responsible for creating a string from
the characters.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Ideas for Xerces redesign

Posted by James Duncan Davidson <du...@x180.com>.

on 7/25/00 12:10 AM, Mikael Helbo Kjær at mhk@dia.dk wrote:

> As much talk is about the XMLString and the performance optimizations in
> JDK1.1 (ancient history) versus JDK1.3 (the release you should be aiming
> for, as it will be supported on all relevant platforms except maybe some
> obscure releases of UNIXes(as if that is relevant), old Macs (who cares) and
> FreeBSD(this could be a problem))

BSDi will be releasing a free version of JDK 1.2 for FreeBSD and will
probably follow on with JDK 1.3. It's still future tense, but good news.

.duncan