You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Martijn Hendriks <ma...@gx.nl> on 2007/08/22 11:34:01 UTC

Database connections & queries

Hi all,

Recently we discovered that our server application needs to be restarted
if Jackrabbit looses its database connection (which was already
documented in JCR-940). We really need the reconnection logic and
created a patch (feedback much appreciated :)). As a side effect of this
we also noticed the following w.r.t. the query system: The current
implementation of the LazyScoreNodeIterator is such to ignore all
RepositoryExceptions that might be thrown while loading the nodes in the
result set, as required by the javax.jcr.NodeIterator spec. For the use
case where the database connection is lost, the iterator thus gives an
incomplete result: only the nodes that are currently cached are
returned.

The incomplete search result that is due to a lost connection is bad for
us as our app caches search results which we know to be stable. This can
be solved by letting the persistence managers block while there's no
database connection available. (The patch for JCR-940 does not do this,
however). Of course this has its advantages and disadvantages; maybe the
PMs behaviour w.r.t. a lost connection should be configurable via the
repository.xml?

Best regards,

Martijn

--

Martijn Hendriks
<GX> creative online development B.V.
 
t: 024 - 3888 261
f: 024 - 3888 621
e: martijnh@gx.nl
 
Wijchenseweg 111
6538 SW Nijmegen
http://www.gx.nl/

Re: Database connections & queries

Posted by Stefan Guggisberg <st...@gmail.com>.

hi martijn,

On 8/22/07, Martijn Hendriks <ma...@gx.nl> wrote:
> Hi all,
>
> Recently we discovered that our server application needs to be restarted
> if Jackrabbit looses its database connection (which was already
> documented in JCR-940). We really need the reconnection logic and
> created a patch (feedback much appreciated :)). As a side effect of this
> we also noticed the following w.r.t. the query system: The current
> implementation of the LazyScoreNodeIterator is such to ignore all
> RepositoryExceptions that might be thrown while loading the nodes in the
> result set, as required by the javax.jcr.NodeIterator spec. For the use
> case where the database connection is lost, the iterator thus gives an
> incomplete result: only the nodes that are currently cached are
> returned.
>
> The incomplete search result that is due to a lost connection is bad for
> us as our app caches search results which we know to be stable. This can

i agree that this is a problem.

> be solved by letting the persistence managers block while there's no
> database connection available. (The patch for JCR-940 does not do this,
> however). Of course this has its advantages and disadvantages; maybe the
> PMs behaviour w.r.t. a lost connection should be configurable via the
> repository.xml?

i don't think that a call to the PM should unconditionally block if it lost the
connection to the backend. it should report the error if it didn't succeed in
re-establishing the connection. otherwise JCR api calls could potentially
block forever.

wrt to your problem: you could e.g. compare the value returned by
NodeIterator#getSize with the number of successful #nextNode calls
in order to detect skipped entries. however, entries could be skipped for
other reasons (like e.g. access control).

cheers
stefan

>
> Best regards,
>
> Martijn
>
> --
>
> Martijn Hendriks
> <GX> creative online development B.V.
>
> t: 024 - 3888 261
> f: 024 - 3888 621
> e: martijnh@gx.nl
>
> Wijchenseweg 111
> 6538 SW Nijmegen
> http://www.gx.nl/
>

Re: Database connections & queries

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 8/29/07, Stefan Guggisberg <st...@gmail.com> wrote:
> On 8/28/07, Jukka Zitting <ju...@gmail.com> wrote:
> > A thought just crossed my mind... As mentioned in another thread a few
> > days ago, we could make NodeImpl to load the underlying node states
> > only on demand as long as we have just the node ID, parent ID, and
> > name of the node available. This way we could "delay" the exception to
> > a method like getProperty() that is allowed to throw appropriate
> > exceptions.
> >
> > Besides, such a solution would most likely give a nice speedup to
> > clients that just want to get the names or paths of nodes that match a
> > query...
>
> sounds interesting. however, with our current Node/PropertyState model
> you would still need to load NodeState's along an item's path in order
> to resolve its path since names are stored in the parent state.

True. Intermediate nodes higher up on the content tree would usually
be in the cache, so I think the performance benefits would still be
pretty good.

BR,

Jukka Zitting

Re: Database connections & queries

Posted by Stefan Guggisberg <st...@gmail.com>.

On 8/28/07, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On 8/24/07, Jukka Zitting <ju...@gmail.com> wrote:
> > On 8/22/07, Martijn Hendriks <ma...@gx.nl> wrote:
> > > The current implementation of the LazyScoreNodeIterator is such to
> > > ignore all RepositoryExceptions that might be thrown while loading the
> > > nodes in the result set, as required by the javax.jcr.NodeIterator spec.
> >
> > I think the correct behaviour would be for the node iterator to throw
> > an exception (unfortunately I guess only a RuntimeException would work
> > here) instead of ignoring such internal errors.
>
> A thought just crossed my mind... As mentioned in another thread a few
> days ago, we could make NodeImpl to load the underlying node states
> only on demand as long as we have just the node ID, parent ID, and
> name of the node available. This way we could "delay" the exception to
> a method like getProperty() that is allowed to throw appropriate
> exceptions.
>
> Besides, such a solution would most likely give a nice speedup to
> clients that just want to get the names or paths of nodes that match a
> query...

sounds interesting. however, with our current Node/PropertyState model
you would still need to load NodeState's along an item's path in order
to resolve its path since names are stored in the parent state.

cheers
stefan

>
> BR,
>
> Jukka Zitting
>

Re: Database connections & queries

Posted by Stefan Guggisberg <st...@gmail.com>.

On 8/29/07, Martijn Hendriks <ma...@gx.nl> wrote:
> Hi,
>
> That's a nice idea! But wouldn't it be confusing that one can get a Node
> object through the nextNode() method which does not exist in the
> repository anymore?
>
> Would this proposal also make the Node.getNodes(String pattern) method
> less expensive? As I gather now all infrastructure for the child nodes
> is created but might not be used if the child nodes name does not fit
> the pattern.

the performance of getNodes(String) could easily be improved by
first evaluating the name pattern against the child node entry names.

feel free to file an improvement issue. patches would of course be
very welcome ;)

cheers
stefan


>
> Best regards,
>
> Martijn
>
> --
>
> Martijn Hendriks
> <GX> creative online development B.V.
>
> t: 024 - 3888 261
> f: 024 - 3888 621
> e: martijnh@gx.nl
>
> Wijchenseweg 111
> 6538 SW Nijmegen
> http://www.gx.nl/
>
> > -----Original Message-----
> > From: Marcel Reutegger [mailto:marcel.reutegger@gmx.net]
> > Sent: Wednesday, August 29, 2007 10:14 AM
> > To: dev@jackrabbit.apache.org
> > Subject: Re: Database connections & queries
> >
> > Jukka Zitting wrote:
> > > A thought just crossed my mind... As mentioned in another
> > thread a few
> > > days ago, we could make NodeImpl to load the underlying node states
> > > only on demand as long as we have just the node ID, parent ID, and
> > > name of the node available. This way we could "delay" the
> > exception to
> > > a method like getProperty() that is allowed to throw appropriate
> > > exceptions.
> > >
> > > Besides, such a solution would most likely give a nice speedup to
> > > clients that just want to get the names or paths of nodes
> > that match a
> > > query...
> >
> > +1, sounds like a very interesting idea.
> >
> > regards
> >   marcel
> >
>

RE: Database connections & queries

Posted by Martijn Hendriks <ma...@gx.nl>.

> That seems worrisome... Could there be a bug in the search 
> index update code for deleting entries? Something like that 
> could easily have been ignored so far if the only effect is a 
> warning in the log due to the current approach of simply 
> dropping the results for which an exception is thrown.

There's an issue with Lucene 2.0 (see
http://issues.apache.org/jira/browse/LUCENE-669 and
http://issues.apache.org/jira/browse/LUCENE-750) which might cause
IOExceptions. This is logged very clearly in the SearchManager.onEvent
method however and I haven't seen such messages.

The problem is that it happens on only one installation to which we have
very little access. We are not able to reproduce it and debugging is
therefore quite problematic...

Best wishes,

Martijn

Re: Database connections & queries

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 8/30/07, Martijn Hendriks <ma...@gx.nl> wrote:
> We do have a very concrete problem which is related: an installation of
> our product which uses Jackrabbit for persistence keeps logging lots of
> warnings (in the order of thousands per day):
>
> Aug 29, 2007 4:56:33 PM
> org.apache.jackrabbit.core.query.lucene.LazyQueryResultImpl$LazyScoreNod
> eIterator fetchNext
> WARNING: Exception retrieving Node with UUID:
> b11aa8a2-beed-4d24-95a0-592b6b193534: javax.jcr.ItemNotFoundException:
> b11aa8a2-beed-4d24-95a0-592b6b193534
>
> Re-building the search index does not help: the logging of these
> warnings just continuous. I just can't believe that these result from a
> save that throws away many nodes while another thread loops over a query
> result. Any thoughts on this?

That seems worrisome... Could there be a bug in the search index
update code for deleting entries? Something like that could easily
have been ignored so far if the only effect is a warning in the log
due to the current approach of simply dropping the results for which
an exception is thrown.

BR,

Jukka Zitting

RE: Database connections & queries

Posted by Martijn Hendriks <ma...@gx.nl>.

Hi,

I agree that it isn't a big problem; your example shows that approx.
equivalent behaviour could already occur now.

We do have a very concrete problem which is related: an installation of
our product which uses Jackrabbit for persistence keeps logging lots of
warnings (in the order of thousands per day):

Aug 29, 2007 4:56:33 PM
org.apache.jackrabbit.core.query.lucene.LazyQueryResultImpl$LazyScoreNod
eIterator fetchNext
WARNING: Exception retrieving Node with UUID:
b11aa8a2-beed-4d24-95a0-592b6b193534: javax.jcr.ItemNotFoundException:
b11aa8a2-beed-4d24-95a0-592b6b193534

Re-building the search index does not help: the logging of these
warnings just continuous. I just can't believe that these result from a
save that throws away many nodes while another thread loops over a query
result. Any thoughts on this?

Best regards,

Martijn

--

Martijn Hendriks
<GX> creative online development B.V.
 
t: 024 - 3888 261
f: 024 - 3888 621
e: martijnh@gx.nl
 
Wijchenseweg 111
6538 SW Nijmegen
http://www.gx.nl/  

> -----Original Message-----
> From: Jukka Zitting [mailto:jukka.zitting@gmail.com] 
> Sent: Thursday, August 30, 2007 9:40 AM
> To: dev@jackrabbit.apache.org
> Subject: Re: Database connections & queries
> 
> Hi,
> 
> On 8/30/07, Martijn Hendriks <ma...@gx.nl> wrote:
> > If we don't take a clustered setup into account, then indeed the 
> > search index should always be in sync with the persistent 
> state. The 
> > ScoreNode objects in the LazyQueryResultImpls resultNodes 
> field are, 
> > however, in general not in sync with the persistent state 
> as they are 
> > always loaded during the construction of the LazyQueryResultImpl 
> > (because the
> > Searchindex.getResultFetchSize() currently returns 
> Integer.MAX_VALUE).
> > Thus, after construction of the LazyQueryResultImpl another thread 
> > could remove nodes that are in the resultNodes array of the 
> > LazyQueryResultImpl with the behaviour described above as a 
> consequence.
> 
> I don't see that as a big problem. It's roughly equivalent to 
> the following case:
> 
>     Session sessionA = ...;
>     Session sessionB = ...;
> 
>     Node node = sessionA.getRootNode().getNode("path/to/node");
> 
>     sessionB.getRootNode().getNode("path/to/node").remove();
>     sessionB.save();
> 
>     node.getProperty(...);
> 
> BR,
> 
> Jukka Zitting
>

Re: Database connections & queries

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 8/30/07, Martijn Hendriks <ma...@gx.nl> wrote:
> If we don't take a clustered setup into account, then indeed the search
> index should always be in sync with the persistent state. The ScoreNode
> objects in the LazyQueryResultImpls resultNodes field are, however, in
> general not in sync with the persistent state as they are always loaded
> during the construction of the LazyQueryResultImpl (because the
> Searchindex.getResultFetchSize() currently returns Integer.MAX_VALUE).
> Thus, after construction of the LazyQueryResultImpl another thread could
> remove nodes that are in the resultNodes array of the
> LazyQueryResultImpl with the behaviour described above as a consequence.

I don't see that as a big problem. It's roughly equivalent to the
following case:

    Session sessionA = ...;
    Session sessionB = ...;

    Node node = sessionA.getRootNode().getNode("path/to/node");

    sessionB.getRootNode().getNode("path/to/node").remove();
    sessionB.save();

    node.getProperty(...);

BR,

Jukka Zitting

RE: Database connections & queries

Posted by Martijn Hendriks <ma...@gx.nl>.

Hi,

> -----Original Message-----
> From: Jukka Zitting [mailto:jukka.zitting@gmail.com] 

> On 8/29/07, Martijn Hendriks <ma...@gx.nl> wrote:
> > That's a nice idea! But wouldn't it be confusing that one can get a 
> > Node object through the nextNode() method which does not 
> exist in the 
> > repository anymore?
> 
> The search index should always be in sync with the persistent 
> state, so such situations should not happen.

If we don't take a clustered setup into account, then indeed the search
index should always be in sync with the persistent state. The ScoreNode
objects in the LazyQueryResultImpls resultNodes field are, however, in
general not in sync with the persistent state as they are always loaded
during the construction of the LazyQueryResultImpl (because the
Searchindex.getResultFetchSize() currently returns Integer.MAX_VALUE).
Thus, after construction of the LazyQueryResultImpl another thread could
remove nodes that are in the resultNodes array of the
LazyQueryResultImpl with the behaviour described above as a consequence.

Best wishes,

Martijn

Re: Database connections & queries

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 8/29/07, Martijn Hendriks <ma...@gx.nl> wrote:
> That's a nice idea! But wouldn't it be confusing that one can get a Node
> object through the nextNode() method which does not exist in the
> repository anymore?

The search index should always be in sync with the persistent state,
so such situations should not happen.

> Would this proposal also make the Node.getNodes(String pattern) method
> less expensive? As I gather now all infrastructure for the child nodes
> is created but might not be used if the child nodes name does not fit
> the pattern.

All the node iterators should become faster for clients that just want
to traverse a tree without looking at properties or typing
information.

Currently the names and identifiers of all child nodes are stored as a
part of the parent node state, so there would be no need to
immediately retrieve the child node states when performing calls like
Node.getNodes() or Node.getNodes(Strin pattern).

BR,

Jukka Zitting

RE: Database connections & queries

Posted by Martijn Hendriks <ma...@gx.nl>.

Hi,

That's a nice idea! But wouldn't it be confusing that one can get a Node
object through the nextNode() method which does not exist in the
repository anymore?

Would this proposal also make the Node.getNodes(String pattern) method
less expensive? As I gather now all infrastructure for the child nodes
is created but might not be used if the child nodes name does not fit
the pattern.

Best regards,

Martijn

--

Martijn Hendriks
<GX> creative online development B.V.

t: 024 - 3888 261
f: 024 - 3888 621
e: martijnh@gx.nl

Wijchenseweg 111
6538 SW Nijmegen
http://www.gx.nl/  

> -----Original Message-----
> From: Marcel Reutegger [mailto:marcel.reutegger@gmx.net] 
> Sent: Wednesday, August 29, 2007 10:14 AM
> To: dev@jackrabbit.apache.org
> Subject: Re: Database connections & queries
> 
> Jukka Zitting wrote:
> > A thought just crossed my mind... As mentioned in another 
> thread a few 
> > days ago, we could make NodeImpl to load the underlying node states 
> > only on demand as long as we have just the node ID, parent ID, and 
> > name of the node available. This way we could "delay" the 
> exception to 
> > a method like getProperty() that is allowed to throw appropriate 
> > exceptions.
> > 
> > Besides, such a solution would most likely give a nice speedup to 
> > clients that just want to get the names or paths of nodes 
> that match a 
> > query...
> 
> +1, sounds like a very interesting idea.
> 
> regards
>   marcel
>

Re: Database connections & queries

Posted by Marcel Reutegger <ma...@gmx.net>.

Jukka Zitting wrote:
> A thought just crossed my mind... As mentioned in another thread a few
> days ago, we could make NodeImpl to load the underlying node states
> only on demand as long as we have just the node ID, parent ID, and
> name of the node available. This way we could "delay" the exception to
> a method like getProperty() that is allowed to throw appropriate
> exceptions.
> 
> Besides, such a solution would most likely give a nice speedup to
> clients that just want to get the names or paths of nodes that match a
> query...

+1, sounds like a very interesting idea.

regards
  marcel

Re: Database connections & queries

Posted by "(Berry) A.W. van Halderen" <b....@hippo.nl>.

On Tue, Aug 28, 2007 at 06:55:14PM +0300, Jukka Zitting wrote:
> On 8/24/07, Jukka Zitting <ju...@gmail.com> wrote:
> > On 8/22/07, Martijn Hendriks <ma...@gx.nl> wrote:
> > > The current implementation of the LazyScoreNodeIterator is such to
> > > ignore all RepositoryExceptions that might be thrown while loading the
> > > nodes in the result set, as required by the javax.jcr.NodeIterator spec.
> > I think the correct behaviour would be for the node iterator to throw
> > an exception (unfortunately I guess only a RuntimeException would work
> > here) instead of ignoring such internal errors.
> 
> A thought just crossed my mind... As mentioned in another thread a few
> days ago, we could make NodeImpl to load the underlying node states
> only on demand as long as we have just the node ID, parent ID, and
> name of the node available. This way we could "delay" the exception to
> a method like getProperty() that is allowed to throw appropriate
> exceptions.
> 
> Besides, such a solution would most likely give a nice speedup to
> clients that just want to get the names or paths of nodes that match a
> query...

Very good idea, decoupling the actual node contents (properties,children)
from its outer shell (node id, etc) opens up all kind of possiblities.
The NoSuchElement solution is IMO not preferable, as its semantics is
already well defined for iterators and not not match this use.

\Berry
-- 
Berry A.W. van Halderen           b.vanhalderen@hippo.nl / berry@halderen.net
Disclaimer: the above is the author's personal opinion and is not the opinion
or policy of his employer or of the little green men that have been following
him all day.

Re: Database connections & queries

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 8/24/07, Jukka Zitting <ju...@gmail.com> wrote:
> On 8/22/07, Martijn Hendriks <ma...@gx.nl> wrote:
> > The current implementation of the LazyScoreNodeIterator is such to
> > ignore all RepositoryExceptions that might be thrown while loading the
> > nodes in the result set, as required by the javax.jcr.NodeIterator spec.
>
> I think the correct behaviour would be for the node iterator to throw
> an exception (unfortunately I guess only a RuntimeException would work
> here) instead of ignoring such internal errors.

A thought just crossed my mind... As mentioned in another thread a few
days ago, we could make NodeImpl to load the underlying node states
only on demand as long as we have just the node ID, parent ID, and
name of the node available. This way we could "delay" the exception to
a method like getProperty() that is allowed to throw appropriate
exceptions.

Besides, such a solution would most likely give a nice speedup to
clients that just want to get the names or paths of nodes that match a
query...

BR,

Jukka Zitting

RE: Database connections & queries

Posted by Martijn Hendriks <ma...@gx.nl>.

Hi,

Using an unchecked exception would help but gives a backward
compatibility issue (which is for us no problem though but it might be
for others). As Stefan already mentioned previously in the thread, there
are several situations in which it is desired that the node iterator
skips entries (access control, nodes deleted by another thread, ...). It
would be nice if the code in the LazyScoreNodeIterator.fetchNext()
method could check the cause of the RepositoryException and then, if it
is a real unexpected error such as a broken database connection, throw
an unchecked exception.

Best wishes,

Martijn

--

Martijn Hendriks
<GX> creative online development B.V.

t: 024 - 3888 261
f: 024 - 3888 621
e: martijnh@gx.nl

Wijchenseweg 111
6538 SW Nijmegen
http://www.gx.nl/  

> -----Original Message-----
> From: Jukka Zitting [mailto:jukka.zitting@gmail.com] 
> Sent: Monday, August 27, 2007 12:13 PM
> To: dev@jackrabbit.apache.org
> Subject: Re: Database connections & queries
> 
> Hi,
> 
> On 8/27/07, Thomas Mueller <th...@gmail.com> wrote:
> > > We could change the behaviour of the iterator to throw a 
> NoSuchElementException.
> > > That's an exception already declared on nextNode().
> >
> > What about a new JackrabbitRuntimeException? 
> NoSuchElementException is 
> > a RuntimeException as well. NoSuchElementException means "the 
> > iteration has no more elements", and is thrown if next() was called 
> > too many times. I prefer fewer exception classes, but in this case 
> > reusing NoSuchElementException doesn't feel right (to me), 
> because the 
> > same exception is thrown for a common user error (not calling or 
> > ignoring hasNext()), and internal problems like dropped connections.
> 
> +1 Using NoSuchElementException would make it difficult for a client
> to determine the real cause of the failure.
> 
> This is actually something we may want to take up in JSR 283, 
> perhaps even going as far as adding a RepositoryException to 
> nextNode()...
> 
> BR,
> 
> Jukka Zitting
>

Re: Database connections & queries

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 8/27/07, Thomas Mueller <th...@gmail.com> wrote:
> > We could change the behaviour of the iterator to throw a NoSuchElementException.
> > That's an exception already declared on nextNode().
>
> What about a new JackrabbitRuntimeException? NoSuchElementException is
> a RuntimeException as well. NoSuchElementException means "the
> iteration has no more elements", and is thrown if next() was called
> too many times. I prefer fewer exception classes, but in this case
> reusing NoSuchElementException doesn't feel right (to me), because the
> same exception is thrown for a common user error (not calling or
> ignoring hasNext()), and internal problems like dropped connections.

+1 Using NoSuchElementException would make it difficult for a client
to determine the real cause of the failure.

This is actually something we may want to take up in JSR 283, perhaps
even going as far as adding a RepositoryException to nextNode()...

BR,

Jukka Zitting

Re: Database connections & queries

Posted by Thomas Mueller <th...@gmail.com>.

Hi,

> We could change the behaviour of the iterator to throw a NoSuchElementException.
> That's an exception already declared on nextNode().

What about a new JackrabbitRuntimeException? NoSuchElementException is
a RuntimeException as well. NoSuchElementException means "the
iteration has no more elements", and is thrown if next() was called
too many times. I prefer fewer exception classes, but in this case
reusing NoSuchElementException doesn't feel right (to me), because the
same exception is thrown for a common user error (not calling or
ignoring hasNext()), and internal problems like dropped connections.

Thomas

Re: Database connections & queries

Posted by Marcel Reutegger <ma...@gmx.net>.

Jukka Zitting wrote:
> On 8/22/07, Martijn Hendriks <ma...@gx.nl> wrote:
>> The current implementation of the LazyScoreNodeIterator is such to
>> ignore all RepositoryExceptions that might be thrown while loading the
>> nodes in the result set, as required by the javax.jcr.NodeIterator spec.
> 
> I think the correct behaviour would be for the node iterator to throw
> an exception (unfortunately I guess only a RuntimeException would work
> here) instead of ignoring such internal errors.

I agree.

We could change the behaviour of the iterator to throw a NoSuchElementException. 
That's an exception already declared on nextNode().

Would that work for you Martijn?

regards
  marcel

Re: Database connections & queries

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 8/22/07, Martijn Hendriks <ma...@gx.nl> wrote:
> The current implementation of the LazyScoreNodeIterator is such to
> ignore all RepositoryExceptions that might be thrown while loading the
> nodes in the result set, as required by the javax.jcr.NodeIterator spec.

I think the correct behaviour would be for the node iterator to throw
an exception (unfortunately I guess only a RuntimeException would work
here) instead of ignoring such internal errors.

BR,

Jukka Zitting