You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Elena Litani <el...@ca.ibm.com> on 2002/08/14 23:59:30 UTC

Performance and XNI change

In order to improve parsing performance of documents that do not include
DOCTYPE, I've tried to modify the XNI pipeline dynamically, removing
DTDValidator during parsing if it is not needed (i.e. no DTD grammar is
found).

Performance improvement varies depending mostly on the size of the
document, i.e. SAX parsing of medium size documents (100K-1M) with
validation turned off is 8%-12% faster.  

I've also created one example based on personal-schema.xml (removing
identity constraints from personal.xsd) with a size of 200K. I've turned
the (XML Schema) validation on and found improvement of 6%-7%.

The above change does not affect small documents (1K-100K) -- the
improvement is minor (less than 1%) probably is caused by measurement
noise.

However, to be able to modify the pipeline dynamically, we need to make
a change to XNI to add the following method to
xni.parser.XMLDocumentFilter:
  
  public void setDocumentSource(XMLDocumentSource source);

While constructing a parsing pipeline, for each filter component we need
to set the documentSource, so that if any component needs to remove
itself, it does it easily by calling
documentSource.setDocumentHandler(this.documentHandler);

Given the performance gain, I don't expect any objection. As we said
before, XNI is an API under development and could be modified.

If anyone does have any objection or concern, speak up now.


Thank you,
-- 
Elena Litani / IBM Toronto

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: Performance and XNI change

Posted by Arnaud Le Hors <le...@us.ibm.com>.
Elena Litani wrote:

> 
> So to set the pipeline we need to call (for filter components):
> f1.setDocumentHandler(f2);
> f2.setDocumentSource(f1);
> 
> If a filter component wants to remove itself, it calls:
> this.documentSource.setDocumentHandler(this.documentHandler);
> this.documentHandler.setDocumentSource(this.documentSource);
> 
> Correct?
> 

Yes. One additional possibility is to specify that setDocumentHandler 
calls setDocumentSource. This way the application only has to do:

f1.setDocumentHandler(f2);

which does: f2.setSourceDocument(this);

It's convenient for the caller. It guarantees that the two links don't 
get out of sync. It requires fewer changes to the existing code which 
will happen to do the right thing as soon as you change 
setDocumentHandler implementation.
All it needs is proper javadoc so that people understand they don't need 
to call setDocumenSource. But even if they did, it wouldn't hurt anyway.

-- 
Arnaud  Le Hors - IBM, XML Standards Strategy Group / W3C AC Rep.


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: Performance and XNI change

Posted by Elena Litani <el...@ca.ibm.com>.
Arnaud Le Hors wrote:
> This said, I claim that the setSourceDocument method ought to be on
> XMLDocumentHandler instead of XMLDocumentFilter (which will then inherit
> it from XMLDocumentHandler). And to make "editing" the pipeline more
> convenient we should had the related getters to XMLDocumentHandler and
> XMLDocumentSource.

Well, this sounds good to me. 

So to set the pipeline we need to call (for filter components):
f1.setDocumentHandler(f2);
f2.setDocumentSource(f1);

If a filter component wants to remove itself, it calls:
this.documentSource.setDocumentHandler(this.documentHandler);
this.documentHandler.setDocumentSource(this.documentSource);

Correct?
-- 
Elena Litani / IBM Toronto

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: Performance and XNI change

Posted by Andy Clark <an...@apache.org>.
Arnaud Le Hors wrote:
> Given the numbers you get, I hope Andy will agree this time, or will 
> come up with some practical reason not to do that (I only ever got FUD 
> from him on that one in the past. ;-)

My objection was not against dynamically modifying
the pipeline. My objection was against allowing the
components control over the pipeline. In my view of
XNI, the parser configuration is in control of the
components, not the other way around.

Having said that, I can also say that now that XNI
has been around awhile, I think we know more about
how it behaves and how to write new components and
new parser configurations. As such, providing more
power and flexibility is certainly a reasonable goal,
especially if it improves performance but only if
it doesn't hurt the XNI core.

So, in general, I don't like changing XNI to allow
components to control the system. But at the same
time, I won't oppose this addition if that's what
people need/want.

> This said, I claim that the setSourceDocument method ought to be on 
> XMLDocumentHandler instead of XMLDocumentFilter (which will then inherit 
> it from XMLDocumentHandler). And to make "editing" the pipeline more 
> convenient we should had the related getters to XMLDocumentHandler and 
> XMLDocumentSource.

The particular change suggested doesn't require
the addition of getter methods. So I have to ask
if we really need them at this time? That would
give each component the ability to completely
traverse and modify the pipeline.

I guess it's not that big of a stretch given the
other change, though.

> Yes. One additional possibility is to specify that setDocumentHandler calls setDocumentSource. This way the application only has to do:
> 
> f1.setDocumentHandler(f2);
> 
> which does: f2.setSourceDocument(this);

This would be more convenient for the caller but
I would prefer to keep the calls explicit instead
of having things get re-arranged implicitly like
you suggest.

-- 
Andy Clark * andyc@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: Performance and XNI change

Posted by Arnaud Le Hors <le...@us.ibm.com>.
 From the very beginning, basically when XNI was still only a sketch on 
a whiteboard in IBM's Cupertino office, I've wanted to have the 
capability to change the pipeline dynamically for this very reason. So 
you can definitely count me as a supporter!
Given the numbers you get, I hope Andy will agree this time, or will 
come up with some practical reason not to do that (I only ever got FUD 
from him on that one in the past. ;-)

This said, I claim that the setSourceDocument method ought to be on 
XMLDocumentHandler instead of XMLDocumentFilter (which will then inherit 
it from XMLDocumentHandler). And to make "editing" the pipeline more 
convenient we should had the related getters to XMLDocumentHandler and 
XMLDocumentSource.

Elena Litani wrote:

>

> The above change does not affect small documents (1K-100K) -- the
> improvement is minor (less than 1%) probably is caused by measurement
> noise.


This is caused by the fact that the gain is not significant compared to 
the rest of the code being executed. Not just the measuring code, the 
whole thing. Removing a component from the pipeline makes you save 
function calls while propagating events through the pipeline. But if you 
only get a few events sent out it won't show much gain. This is consistent.

-- 
Arnaud  Le Hors - IBM, XML Standards Strategy Group / W3C AC Rep.


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org