You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Er...@Classwell.com on 2002/11/15 17:35:19 UTC
Re: Schema key and unique contraints VERY slow


I miswrote the version--I'm really using 2.2.1 and so seeing this slowness
with the latest version. I'm moving this thread to xerces-j-dev...

I can understand the relatively low priority of constraint
handling...though uncommonness of a feature's use and lack of useable
support for it is kind of a chicken-and-egg / self-fulfilling prophesy kind
of thing.

I'm not ruling out digging into the code myself to possibly take a look at
the problem and possibly fixing it, though I wouldn't want to make any
promises that I'll find time for it. In case I do, I'd welcome any pointers
anyone wants to give me to finding the code that handles this.

But before digging into code, perhaps we could discuss at a high level,
before d how the xpath bit in these constraints are handled. The almost
equal slowness of the SAX parsing of this makes me wonder if both parsing
methods are using the same xpath code, perhaps DOM-based xpath code, like
that provided by Xalan (a suspicion fed by the lack of SAX based xpath
processors available). If that's the case, then there is likely huge room
for optimization.

Its all the functions and different axes and whatnot that makes xpath
processors hard to write (espec in SAX), but the Schema spec indicates that
these xpaths are allowed only to be a subset of xpath. I'm not very good at
deciphering the mathy jargon of the spec, but it looks to me like the
subset they define would be much easier to write a code for, including SAX
code that doesn't have to resort to building a DOM tree. Even if that's not
the case, the most likely usages of xpath for these constraints are going
to be the most simple ones, which would be relatively easy to detect and
handle with simpler more efficient code that doesn't cover all the
possibilies.

Anyway, I'd appreciate being confirmed or disabused of these speculations
before I decide whether  to start digging into Xerces code. :^)

Your comment about "Xerces will take O(N^2) operations to prove it to
itself" puzzles me though. Surely it doesn't iterate through the entire
document again every time it finds a key node, to compare for dupes? Surely
it simply stores the keyvalue into a HashSet or something as it goes and
checks for previous key existance as it goes, giving O(N) operations?

Eric








                                                                                                            
                      neilg@ca.ibm.com                                                                      
                                               To:       xerces-j-user@xml.apache.org                       
                      11/15/2002 10:40         cc:                                                          
                      AM                       Subject:  Re: Schema key and unique contraints VERY slow     
                      Please respond to                                                                     
                      xerces-j-user                                                                         
                                                                                                            
                                                                                                            




Hi Eric,

You'll definitely want to move to 2.2.1; a fair bit of work has been done
on improving performance (and conformance!) of identity constraint support
since 2.0.2.

That said, I can't say I'm shocked that you're not seeing good performance.
If you have N distinct keys in your document, Xerces will take O(N^2)
operations to prove it to itself; I daresay something could be done on this
front.  But our goal is to optimize performance for the case in which ID
constraints aren't used, since they seem to be relatively uncommon; so any
change to make them perform better would have to live within these
constraints.

You're the first I know of to report this kind of problem.  So I guess you
could file this as a bug, but I can't promise how fast it'd be
investigated.  As I always say, patches welcome!  :)

Cheers,
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  neilg@ca.ibm.com




|---------+-------------------------------->
|         |           Eric_Schwarzenbach@Cl|
|         |           asswell.com          |
|         |                                |
|         |           11/14/2002 09:04 PM  |
|         |           Please respond to    |
|         |           xerces-j-user        |
|         |                                |
|---------+-------------------------------->
  >
---------------------------------------------------------------------------------------------------------------------------------------------|

  |
|
  |       To:       xerces-j-user@xml.apache.org
|
  |       cc:
|
  |       Subject:  Schema key and unique contraints VERY slow
|
  |
|
  |
|
  >
---------------------------------------------------------------------------------------------------------------------------------------------|





Xerces2 (I'm using the freshly downloaded 2.0.2) seems to be hideously slow
valdiating xs:unique and xs:key constraints. Painfully slow on a human
scale not simply a processor scale...a large document of mine that
specifies a unique id attribute on each element of a certain kind (where we
are talking 10,000 of these elements in a document of about 4 Meg ) goes
from taking a few seconds to validate without these constraints to several
minutes with them. This happens with both SAX and DOM parsing.

I expect there to be some processing cost to using this feature but this is
fairly ridiculous...doing similar checking in my own java code which is
using the parser takes nowhere nearly as long (in fact building indexes on
the entire document along with it takes nowhere nearly as long). Something
would seem to be seriously amiss, unless I'm out to lunch with regard to my
usage...

An example of my usage is something like:

            <xs:unique name="makeElemUnique">
                  <xs:selector xpath=".//myns:elem"/>
                  <xs:field xpath="@id"/>
            </xs:unique>

This is defined within the scope of the root element which can (will)
contain many of these elems at many different levels.

Should I file this as a bug?

Eric





---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org







---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org