You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@commons.apache.org by Knut Wannheden <kn...@gmail.com> on 2005/01/14 10:37:16 UTC

[JXPath] unordered node-sets

Dear JXPath users and developers,

Some XPath expressions (e.g. "//foo" which uses the descendant axis), when
applied to a Java object graph, can cause JXPath to traverse a vast number of
objects, which potentially takes a lot of time. This is what happens when
hasNext() or next() is called on the Iterator returned by
JXPathContext#iterate(String). Moreover, as the call to hasNext() or next()
seems to cause all matches to be pre-fetched and ordered (at least when the
expression starts with the descendant axis), the call appears to be blocking.

I was wondering if there is a way to iterate through the matched node-set in a
non-blocking manner. My understanding was that node-sets per definition are
unordered.

TIA,

--knut



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org

Re: [JXPath] unordered node-sets

Posted by Knut Wannheden <kn...@gmail.com>.

Dmitri,

Dmitri Plotnikov <dmitri <at> apache.org> writes:

> 
> Ok, I think I now get it. Sorry about having been so obtuse.  To make the
search interruptable, we should
> eliminate the isChildOrderingRequired method. Actually I don't even know why
we have it. Paths like
> "//foo" would produce results in the right order even without it.
> 
> Try this: comment out the DescendantContext#isChildOrderingRequired method. 
Let me know if it makes a
> difference.  If it does without breaking anything else, I will make the change
permanent.
> 

Yes, that change did help!  I will still leave my dirty hack in for interrupting
other kinds of badly written expressions.

Cheers,

--knut


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [JXPath] unordered node-sets

Posted by Dmitri Plotnikov <dm...@apache.org>.

Knut,

Knut Wannheden <kn...@gmail.com> wrote:

Dmitri,

Dmitri Plotnikov apache.org> writes:

> 
> You are right - I am starting to forget everything. The node set does not 
> have to be ordered, but if I want to implement axes like "preceding" and 
> "following", I have to impose some order on the object properties.
> 

What does this mean for JXPath? Does the ordering only ever have to take place
in PredicateContext?

I don't think there is any ordering in PredicateContext.  The ordering of properties is done by JXPathBasicBeanInfo, the ordering of keys in maps is done by DynamicPropertyPointer.

> The advice I have given in similar situations before was always the same: 
> You have at least three options:
> 1. Provide custom BeanInfo that excludes upward going links, e.g. keep the 
> link from Window to Widget, but hide the link from Widget to Window.
> 2. Create custom implementations of NodePointer and NodePointerFactory 
> (which seems to be exactly what you are doing)
> 3. (IMO, the best) avoid using JXPath for extensive searches altogether - it 
> is not designed to handle them. It can show horrific performance, which is 
> exactly what you are observing. Systems like Xalan do much better because 
> they work with static structures that they index prior to the evaluation of 
> XPaths. Try either using specific paths like "/a/b/c" or utilize indexing 
> with IdentityManager or KeyManager; or use a custom extension function to 
> perform the search.
> 

I will try to go with option 2 as I don't think 3 really is an option; or what
other tool would let me evaluate XPaths against Java object graphs?

I don't worry about the performance too much as the queries will be run by a
user through a GUI. What is problematic is the fact that the call to
Iterator#hasNext(), depending on the expression and object graph, is 
potentially long running and can't be interrupted. Long running not because
JXPath couldn't yet determine if there is a next node or not, but because it
decided to prefetch and order *all* nodes which will be returned by subsequent
calls to Iterator#next().

In my case I found that this prefetching and ordering takes place in
ChildContext#constructIterator(). This method is called by
EvalContext#hasNext(), because it determined that the nodes need to be
prefetched since the ChildContext's parent context is a DescendantContext, 
which requires child ordering (in document order).

Now I hope you can see where my problem lies and why I initially asked about 
the interpretation of XPath node-sets. After browsing the source code I think 
I now understand why it works the way it works. What if the ordering (forward
and backward) were entirely left up to the NodeIterators? Would that solve the
problem?

Meanwhile, to be able to interrupt JXPath, I will implement my own NodePointer
throwing an unchecked exception to abort JXPath. (I tried this in the
NodePointerFactory, but there the exception is caught and ignored.)

Ok, I think I now get it. Sorry about having been so obtuse.  To make the search interruptable, we should eliminate the isChildOrderingRequired method. Actually I don't even know why we have it. Paths like "//foo" would produce results in the right order even without it.

Try this: comment out the DescendantContext#isChildOrderingRequired method.  Let me know if it makes a difference.  If it does without breaking anything else, I will make the change permanent.

Regards,

--knut

Best,

- Dmitri

Re: [JXPath] unordered node-sets

Posted by Dmitri Plotnikov <dm...@apache.org>.

Knut,

Knut Wannheden <kn...@gmail.com> wrote:

Dmitri,

Dmitri Plotnikov apache.org> writes:

> 
> You are right - I am starting to forget everything. The node set does not 
> have to be ordered, but if I want to implement axes like "preceding" and 
> "following", I have to impose some order on the object properties.
> 

What does this mean for JXPath? Does the ordering only ever have to take place
in PredicateContext?

I don't think there is any ordering in PredicateContext.  The ordering of properties is done by JXPathBasicBeanInfo, the ordering of keys in maps is done by DynamicPropertyPointer.

> The advice I have given in similar situations before was always the same: 
> You have at least three options:
> 1. Provide custom BeanInfo that excludes upward going links, e.g. keep the 
> link from Window to Widget, but hide the link from Widget to Window.
> 2. Create custom implementations of NodePointer and NodePointerFactory 
> (which seems to be exactly what you are doing)
> 3. (IMO, the best) avoid using JXPath for extensive searches altogether - it 
> is not designed to handle them. It can show horrific performance, which is 
> exactly what you are observing. Systems like Xalan do much better because 
> they work with static structures that they index prior to the evaluation of 
> XPaths. Try either using specific paths like "/a/b/c" or utilize indexing 
> with IdentityManager or KeyManager; or use a custom extension function to 
> perform the search.
> 

I will try to go with option 2 as I don't think 3 really is an option; or what
other tool would let me evaluate XPaths against Java object graphs?

I don't worry about the performance too much as the queries will be run by a
user through a GUI. What is problematic is the fact that the call to
Iterator#hasNext(), depending on the expression and object graph, is 
potentially long running and can't be interrupted. Long running not because
JXPath couldn't yet determine if there is a next node or not, but because it
decided to prefetch and order *all* nodes which will be returned by subsequent
calls to Iterator#next().

In my case I found that this prefetching and ordering takes place in
ChildContext#constructIterator(). This method is called by
EvalContext#hasNext(), because it determined that the nodes need to be
prefetched since the ChildContext's parent context is a DescendantContext, 
which requires child ordering (in document order).

Now I hope you can see where my problem lies and why I initially asked about 
the interpretation of XPath node-sets. After browsing the source code I think 
I now understand why it works the way it works. What if the ordering (forward
and backward) were entirely left up to the NodeIterators? Would that solve the
problem?

Meanwhile, to be able to interrupt JXPath, I will implement my own NodePointer
throwing an unchecked exception to abort JXPath. (I tried this in the
NodePointerFactory, but there the exception is caught and ignored.)

Ok, I think I now get it. Sorry about having been so obtuse.  To make the search interruptable, we should eliminate the isChildOrderingRequired method. Actually I don't even know why we have it. Paths like "//foo" would produce results in the right order even without it.

Try this: comment out the DescendantContext#isChildOrderingRequired method.  Let me know if it makes a difference.  If it does without breaking anything else, I will make the change permanent.

Regards,

--knut

Best,

- Dmitri

Re: [JXPath] unordered node-sets

Posted by Knut Wannheden <kn...@gmail.com>.

Dmitri,

Dmitri Plotnikov <dmitri <at> apache.org> writes:

> 
> You are right - I am starting to forget everything.  The node set does not 
> have to be ordered, but if I want to implement axes like "preceding" and 
> "following", I have to impose some order on the object properties.
> 

What does this mean for JXPath?  Does the ordering only ever have to take place
in PredicateContext?

> The advice I have given in similar situations before was always the same: 
> You have at least three options:
> 1. Provide custom BeanInfo that excludes upward going links, e.g. keep the 
> link from Window to Widget, but hide the link from Widget to Window.
> 2. Create custom implementations of NodePointer and NodePointerFactory 
> (which seems to be exactly what you are doing)
> 3. (IMO, the best) avoid using JXPath for extensive searches altogether - it 
> is not designed to handle them.  It can show horrific performance, which is 
> exactly what you are observing. Systems like Xalan do much better because 
> they work with static structures that they index prior to the evaluation of 
> XPaths. Try either using specific paths like "/a/b/c" or utilize indexing 
> with IdentityManager or KeyManager; or use a custom extension function to 
> perform the search.
> 

I will try to go with option 2 as I don't think 3 really is an option; or what
other tool would let me evaluate XPaths against Java object graphs?

I don't worry about the performance too much as the queries will be run by a
user through a GUI.  What is problematic is the fact that the call to
Iterator#hasNext(), depending on the expression and object graph, is 
potentially long running and can't be interrupted.  Long running not because
JXPath couldn't yet determine if there is a next node or not, but because it
decided to prefetch and order *all* nodes which will be returned by subsequent
calls to Iterator#next().

In my case I found that this prefetching and ordering takes place in
ChildContext#constructIterator().  This method is called by
EvalContext#hasNext(), because it determined that the nodes need to be
prefetched since the ChildContext's parent context is a DescendantContext, 
which requires child ordering (in document order).

Now I hope you can see where my problem lies and why I initially asked about 
the interpretation of XPath node-sets.  After browsing the source code I think 
I now understand why it works the way it works.  What if the ordering (forward
and backward) were entirely left up to the NodeIterators?  Would that solve the
problem?

Meanwhile, to be able to interrupt JXPath, I will implement my own NodePointer
throwing an unchecked exception to abort JXPath.  (I tried this in the
NodePointerFactory, but there the exception is caught and ignored.)

Regards,

--knut

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org

Re: [JXPath] unordered node-sets

Posted by Dmitri Plotnikov <dm...@apache.org>.

Knut,

Thank you for the feedback.  There is no need to make an "official" feature request. I'll take care of it.

Regards,

- Dmitri

Knut Wannheden <kn...@gmail.com> wrote:
Dmitri,

Knut Wannheden gmail.com> writes:

> 
> Dmitri Plotnikov apache.org> writes:
> 
> > @name is a reserved attribute name for Java Objects. It in fact checks the 
> > name of the property/key rather than the value of the property called 
> > "name". I think it is equivalent to "[name() = 'Foo']. This feature should 
> > be deprecated, and I will try to do so in a future release.
> > 
> 
> Yes, I think name() is indeed equivalent and IMO preferable as it's more XPath
> style and doesn't keep users from using @name to access an object's property
> "name" (i.e. getName()).
> 
> In the JXPath based EMF search tool I've been working on
> (http://emfsearch.sourceforge.net/) I have now customized the JXPath axes as
> I've previously outlined (child:: and thus also descendant:: only for EMF
> containment references). With this customization it is now very difficult to
> write queries which access the getName() method of an object. Instead of
> "[@name = 'Foo']" I have to use something like "[string(@name) = 'Foo']".
> 

If found a reasonable workaround for this which might be useful to others. The
equality check can simply be turned around. I.e. "['Foo' = @name]" works fine.

Now I've just got two more questions on this :-) Have you been able to check if
removing DescendantContext#isChildOrderingRequired() breaks anything? Do you
want me to open a feature request for either of these two changes?

Regards,

--knut

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org

Re: [JXPath] unordered node-sets

Posted by Knut Wannheden <kn...@gmail.com>.

Dmitri,

Knut Wannheden <knut.wannheden <at> gmail.com> writes:

> 
> Dmitri Plotnikov <dmitri <at> apache.org> writes:
> 
> > @name is a reserved attribute name for Java Objects.  It in fact checks the 
> > name of the property/key rather than the value of the property called 
> > "name". I think it is equivalent to "[name() = 'Foo'].  This feature should 
> > be deprecated, and I will try to do so in a future release.
> > 
> 
> Yes, I think name() is indeed equivalent and IMO preferable as it's more XPath
> style and doesn't keep users from using @name to access an object's property
> "name" (i.e. getName()).
> 
> In the JXPath based EMF search tool I've been working on
> (http://emfsearch.sourceforge.net/) I have now customized the JXPath axes as
> I've previously outlined (child:: and thus also descendant:: only for EMF
> containment references).  With this customization it is now very difficult to
> write queries which access the getName() method of an object.  Instead of
> "[@name = 'Foo']" I have to use something like "[string(@name) = 'Foo']".
> 

If found a reasonable workaround for this which might be useful to others.  The
equality check can simply be turned around.  I.e. "['Foo' = @name]" works fine.

Now I've just got two more questions on this :-)  Have you been able to check if
removing DescendantContext#isChildOrderingRequired() breaks anything?  Do you
want me to open a feature request for either of these two changes?

Regards,

--knut


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org

Re: [JXPath] unordered node-sets

Posted by Knut Wannheden <kn...@gmail.com>.

Dmitri,

Dmitri Plotnikov <dmitri <at> apache.org> writes:

> @name is a reserved attribute name for Java Objects.  It in fact checks the 
> name of the property/key rather than the value of the property called 
> "name". I think it is equivalent to "[name() = 'Foo'].  This feature should 
> be deprecated, and I will try to do so in a future release.
> 

Yes, I think name() is indeed equivalent and IMO preferable as it's more XPath
style and doesn't keep users from using @name to access an object's property
"name" (i.e. getName()).

In the JXPath based EMF search tool I've been working on
(http://emfsearch.sourceforge.net/) I have now customized the JXPath axes as
I've previously outlined (child:: and thus also descendant:: only for EMF
containment references).  With this customization it is now very difficult to
write queries which access the getName() method of an object.  Instead of
"[@name = 'Foo']" I have to use something like "[string(@name) = 'Foo']".

Thus I'd be very much in favor of deprecating or even removing this feature. 
Meanwhile, how would I go about disabling this feature in my application?

BTW, should I report an enhancement request for this?

Regards,

--knut

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org

Re: [JXPath] unordered node-sets

Posted by Dmitri Plotnikov <dm...@apache.org>.

Knut,

> Dmitri,
>
> Dmitri Plotnikov <dmitri <at> apache.org> writes:
>
>>
>> Perhaps my reading of the XPath specification is incorrect, but I 
>> understood
>> it to say that node sets are in fact ordered.  The order of elements in 
>> the
>> set is determined by the document order as well as the axis used (for
>> example, "preceding::" will return nodes in the reverse document order).
>
> To me this is somewhat unclear in the specification.  In section 1 a 
> node-set is
> defined as "an unordered collection of nodes without duplicates".  And in 
> the
> next paragraph it is stated that the evaluation context consists of 
> (amongst
> other things) a "context position".  This context position is set to the
> proximity position (which is always with respect to an axis) as the 
> predicates
> are evaluated.

You are right - I am starting to forget everything.  The node set does not 
have to be ordered, but if I want to implement axes like "preceding" and 
"following", I have to impose some order on the object properties.

>> When I was first developing JXPath, I struggled with the issue of 
>> document
>> order as it applies to the non-XML object models.  The best solution I 
>> could
>> come up with was to impose some predictable order that makes sense.  Of
>> course the first one that came to mind was the alphabetical order.  So, 
>> this
>> is why property names of Java Beans are sorted by JXPath machinery. 
>> That's
>> the only sorting I can think of that goes on in JXPath.
>>
>> As to your specific issue, I cannot really make any specific 
>> recommendations
>> without seeing the code.  Would it be possible to post a code sample that
>> illustrates the problem?
>>
>
> What I think is the root cause of my problem is that I in my model have 
> many
> two-way references between the objects, so even with a small model there 
> are
> uncountable different paths to reach the same object using the descendant 
> axis.
> (We've already discussed this:
> http://article.gmane.org/gmane.comp.jakarta.commons.user/3154.)  A call to
> hasNext() to the Iterator returned by JXPathContext#iterate(String) will 
> at some
> point (I think just after all immedeate children have been checked) start
> exploring all these paths before returning, which potentially takes 
> "forever".
>
> I am unsure in what format to provide any example code as any EMF model 
> would
> pull in quite a lot of dependencies.  But I could provide instructions on 
> how to
> set it up.
>
> One way around the bidirectional references problem is to take some of the 
> EMF
> metadata about the object properties into account (see also
> http://thread.gmane.org/gmane.comp.jakarta.commons.user/3153), which I BTW 
> would
> like to do anyway.  I could then for instance limit the child axis to
> containment references only.
The advice I have given in similar situations before was always the same: 
You have at least three options:
1. Provide custom BeanInfo that excludes upward going links, e.g. keep the 
link from Window to Widget, but hide the link from Widget to Window.
2. Create custom implementations of NodePointer and NodePointerFactory 
(which seems to be exactly what you are doing)
3. (IMO, the best) avoid using JXPath for extensive searches altogether - it 
is not designed to handle them.  It can show horrific performance, which is 
exactly what you are observing. Systems like Xalan do much better because 
they work with static structures that they index prior to the evaluation of 
XPaths. Try either using specific paths like "/a/b/c" or utilize indexing 
with IdentityManager or KeyManager; or use a custom extension function to 
perform the search.

> I have started playing with this by writing my own implementations of
> NodePointerFactory and NodePointer.  Possibly I also need a custom 
> NodeIterator
> implementation.  The problem I have here, and in fact also with the 
> default Java
> Beans implementation, is that a location step with a predicate "[@name = 
> 'Foo']"
> doesn't seem to check for a getName() on the context object.  But "[name =
> 'Foo']" works.  Note that the context object isn't a Map.  Where would I
> override this behaviour?

@name is a reserved attribute name for Java Objects.  It in fact checks the 
name of the property/key rather than the value of the property called 
"name". I think it is equivalent to "[name() = 'Foo'].  This feature should 
be deprecated, and I will try to do so in a future release.

>
> Regards,
>
> --knut

Yours,

- Dmitri 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org

Re: [JXPath] unordered node-sets

Posted by Knut Wannheden <kn...@gmail.com>.

Dmitri,

Dmitri Plotnikov <dmitri <at> apache.org> writes:

> 
> Perhaps my reading of the XPath specification is incorrect, but I understood 
> it to say that node sets are in fact ordered.  The order of elements in the 
> set is determined by the document order as well as the axis used (for 
> example, "preceding::" will return nodes in the reverse document order). 

To me this is somewhat unclear in the specification.  In section 1 a node-set is
defined as "an unordered collection of nodes without duplicates".  And in the
next paragraph it is stated that the evaluation context consists of (amongst
other things) a "context position".  This context position is set to the
proximity position (which is always with respect to an axis) as the predicates
are evaluated.

> When I was first developing JXPath, I struggled with the issue of document 
> order as it applies to the non-XML object models.  The best solution I could 
> come up with was to impose some predictable order that makes sense.  Of 
> course the first one that came to mind was the alphabetical order.  So, this 
> is why property names of Java Beans are sorted by JXPath machinery.  That's 
> the only sorting I can think of that goes on in JXPath.
> 
> As to your specific issue, I cannot really make any specific recommendations 
> without seeing the code.  Would it be possible to post a code sample that 
> illustrates the problem?
> 

What I think is the root cause of my problem is that I in my model have many
two-way references between the objects, so even with a small model there are
uncountable different paths to reach the same object using the descendant axis.
(We've already discussed this:
http://article.gmane.org/gmane.comp.jakarta.commons.user/3154.)  A call to
hasNext() to the Iterator returned by JXPathContext#iterate(String) will at some
point (I think just after all immedeate children have been checked) start
exploring all these paths before returning, which potentially takes "forever".

I am unsure in what format to provide any example code as any EMF model would
pull in quite a lot of dependencies.  But I could provide instructions on how to
set it up.

One way around the bidirectional references problem is to take some of the EMF
metadata about the object properties into account (see also
http://thread.gmane.org/gmane.comp.jakarta.commons.user/3153), which I BTW would
like to do anyway.  I could then for instance limit the child axis to
containment references only.

I have started playing with this by writing my own implementations of
NodePointerFactory and NodePointer.  Possibly I also need a custom NodeIterator
implementation.  The problem I have here, and in fact also with the default Java
Beans implementation, is that a location step with a predicate "[@name = 'Foo']"
doesn't seem to check for a getName() on the context object.  But "[name =
'Foo']" works.  Note that the context object isn't a Map.  Where would I
override this behaviour?

Regards,

--knut

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org

Re: [JXPath] unordered node-sets

Posted by Dmitri Plotnikov <dm...@apache.org>.

Knut,

Perhaps my reading of the XPath specification is incorrect, but I understood 
it to say that node sets are in fact ordered.  The order of elements in the 
set is determined by the document order as well as the axis used (for 
example, "preceding::" will return nodes in the reverse document order). 
When I was first developing JXPath, I struggled with the issue of document 
order as it applies to the non-XML object models.  The best solution I could 
come up with was to impose some predictable order that makes sense.  Of 
course the first one that came to mind was the alphabetical order.  So, this 
is why property names of Java Beans are sorted by JXPath machinery.  That's 
the only sorting I can think of that goes on in JXPath.

As to your specific issue, I cannot really make any specific recommendations 
without seeing the code.  Would it be possible to post a code sample that 
illustrates the problem?

Thank you,

- Dmitri


----- Original Message ----- 
From: "Knut Wannheden" <kn...@gmail.com>
To: <co...@jakarta.apache.org>
Sent: Friday, January 14, 2005 4:37 AM
Subject: [JXPath] unordered node-sets


> Dear JXPath users and developers,
>
> Some XPath expressions (e.g. "//foo" which uses the descendant axis), when
> applied to a Java object graph, can cause JXPath to traverse a vast number 
> of
> objects, which potentially takes a lot of time. This is what happens when
> hasNext() or next() is called on the Iterator returned by
> JXPathContext#iterate(String). Moreover, as the call to hasNext() or 
> next()
> seems to cause all matches to be pre-fetched and ordered (at least when 
> the
> expression starts with the descendant axis), the call appears to be 
> blocking.
>
> I was wondering if there is a way to iterate through the matched node-set 
> in a
> non-blocking manner. My understanding was that node-sets per definition 
> are
> unordered.
>
> TIA,
>
> --knut


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org