You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Elena Litani <el...@ca.ibm.com> on 2003/10/23 21:57:10 UTC

XNI performance: resetting the pipeline

Hi all,

as we started to measure the performance of the parser for small
documents, we saw that Xerces spends a LOT of time in resetting the
pipeline: before each parse, the configuration calls
org.apache.xerces.xni.parsers.XMLComponent reset(..) method on each
component in the pipeline. During the reset call the components not only
reset local settings but also query features/properties that apply to
the component. Normally, users set features and properties before first
parse, and then tell the parser to parse bunch of documents. So in
general case, Xerces spends extra time for querying features and
properties that were never changed before the parse.

To verify how much extra time we spend, I've changed the code to let the
configuration to decide when the XMLComponentManager needs to be passed
in reset() to the components. If there was no change in features or
properties, the configuration while resetting the components passes
"null" value for XMLComponentManager, so component will only reset local
setting and won't attempt to query settings. 

By making this change I've seen up to 20% performance improvement (1-2k
documents), so I think it would be great if we made this change. 

The only big question I have is if anyone things it is an XNI change.
The docs state the following:

/**
  * Resets the component. The component can query the component manager
  * about any features and properties that affect the operation of the
  * component.
  */
public void reset(XMLComponentManager componentManager) 
        throws XMLConfigurationException;

So the docs do not state explicitly that "null" is allowed. But in other
places in XNI, "null" value is also not stated explicitly but could be
used (e.g. while setting a document handler).

So what do you think?

There are other possible solutions, however I think letting the
configuration to control whether properties or features needs to be
queried by the component is the cleanest approach. 

Just in case you are wondering what are other solutions are:
1) each component could implement setFeature, setProperty and get all
the properties and features. However, during
XMLComponent.reset() no features and properties will be queried.
I am not sure if XNI components were designed to be reset in such a way,
and I suspect this approach might be a bit slower.

2) introduce a new internal feature, e.g.
"/internal/settings-unchanged", that
each component can query before querying the rest of features and
properties. If this new feature is set to true, the component won't
query any other features. Again this is a bit slower, plus we are
introducing yet one more feature to Xerces..

Thank you,
-- 
Elena Litani / IBM Toronto

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: XNI performance: resetting the pipeline

Posted by Andy Clark <an...@apache.org>.
Have we come to a conclusion on this issue? As I
mentioned earlier, I would like to see the perf
gain from doing the "special" feature that tells
components nothing has changed first.

What's with the silence on the mailing list? Is
everyone trapped in the Matrix...?

-- 
Andy Clark * andyc@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: XNI performance: resetting the pipeline

Posted by Andy Clark <an...@apache.org>.
Arnaud Le Hors wrote:
> Andy, this last recommendation is at odd with your first statement.
> I agree we should look for performance gains in other places. But we 
> should do that too, not instead. 20% is such a large number that we 
> ought to address this first.

I think I stated it poorly. What I meant was that
I don't think we can change this safely, as per the
concerns I detailed in my other posts. Therefore, I
think we're going to end up having to improve other
areas in other to make up the lost performance.

> I personally like Elena's first proposal, to pass null. I think it's 
> simple and while I agree it *may* cause some disruption the gain is such 
> that it makes it worthwhile. Think about the number of people/users who 
> will benefit from it vs the number of component developers who will have 
> to update their components...

Yes, but I don't know if it will ultimately work.

-- 
Andy Clark * andyc@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: XNI performance: resetting the pipeline

Posted by Andy Clark <an...@apache.org>.
Ted Leung wrote:
> +1
> 
> 20% is a non-trivial gain.  And there are likely to be a lot more people 
>  who want to parse small documents than want to use custom configurations.

Ack! Don't shoot so fast from the hip, Ted.

I would like to see the gain achieved from the
non-breaking change first. If it's approximately
20% gain from the Xerces components, then I see
no reason not to change this in a NON-BREAKING
way. :)

-- 
Andy Clark * andyc@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: XNI performance: resetting the pipeline

Posted by Ted Leung <tw...@sauria.com>.
On 10/27/2003 3:27 PM, Arnaud Le Hors wrote:

> I personally like Elena's first proposal, to pass null. I think it's 
> simple and while I agree it *may* cause some disruption the gain is such 
> that it makes it worthwhile. Think about the number of people/users who 
> will benefit from it vs the number of component developers who will have 
> to update their components...

+1

20% is a non-trivial gain.  And there are likely to be a lot more people 
  who want to parse small documents than want to use custom configurations.

Ted


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: XNI performance: resetting the pipeline

Posted by Arnaud Le Hors <le...@us.ibm.com>.
Andy Clark wrote:
 > Interesting. I'm quite surprised that 20% of parsing
 > time (for small documents) is taken up by this. I guess
 > it's directly related to the number of components in
 > the parser configuration.
...
 >
 > Perhaps we need to look for performance gains in
 > other places...
 >

Andy, this last recommendation is at odd with your first statement.
I agree we should look for performance gains in other places. But we 
should do that too, not instead. 20% is such a large number that we 
ought to address this first.
I personally like Elena's first proposal, to pass null. I think it's 
simple and while I agree it *may* cause some disruption the gain is such 
that it makes it worthwhile. Think about the number of people/users who 
will benefit from it vs the number of component developers who will have 
to update their components...
--
Arnaud  Le Hors - IBM, XML Standards Strategy Group / W3C AC Rep.


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: XNI performance: resetting the pipeline

Posted by Andy Clark <an...@apache.org>.
Elena Litani wrote:
> So the docs do not state explicitly that "null" is allowed. But in other
> places in XNI, "null" value is also not stated explicitly but could be
> used (e.g. while setting a document handler).
> 
> So what do you think?

Interesting. I'm quite surprised that 20% of parsing
time (for small documents) is taken up by this. I guess
it's directly related to the number of components in
the parser configuration.

> So the docs do not state explicitly that "null" is allowed. But in other
> places in XNI, "null" value is also not stated explicitly but could be
> used (e.g. while setting a document handler).

This is certainly a possibility but it would require
a change to all components which is not too bad for
Xerces but what about other people's components? So
this would be a breaking change.

> Just in case you are wondering what are other solutions are:
> 1) each component could implement setFeature, setProperty and get all
> the properties and features. However, during
> XMLComponent.reset() no features and properties will be queried.
> I am not sure if XNI components were designed to be reset in such a way,
> and I suspect this approach might be a bit slower.

I'm not quite following your explanation. Are you
saying that each component would still be passed the
component manager but that the component is smart
enough to *not* query the settings that it got from
the last parse? This seems fragile at best.

> 2) introduce a new internal feature, e.g.
> "/internal/settings-unchanged", that
> each component can query before querying the rest of features and
> properties. If this new feature is set to true, the component won't
> query any other features. Again this is a bit slower, plus we are
> introducing yet one more feature to Xerces..

I think that this would be the safest change that
would allow existing components to continue to work.
However, it makes me wonder about how well this
would actually work in practice. Would this apply
to *all* features and properties? or only internal?
only external?

The more dynamic the internals, the more likely
that some setting has changed. Then this special
internal marker feature becomes useless and actually
*adds* work because each component would need to
query this feature and then still need to query
all the other ones.

And when you look at it, this is no different than
passing a null component manager. Unless absolutely
no setting changed, you would *not* be able to pass
a null reference.

Perhaps we need to look for performance gains in
other places...

-- 
Andy Clark * andyc@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org