You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Sergey Podatelev <br...@gmail.com> on 2010/02/02 14:10:34 UTC

Index corruption

Hello,

I'm using Jackrabbit 1.5.3 and sometimes I run into index corruption
problem. I can't consistently reproduce the issue, so upgrading to 1.6
or 2.0 is not an option for me, as I'm not sure my exact issue is
solved there.
I'm also not in the liberty of exposing my actual repository
structure, so what I'm asking here is some pointers where exactly too
look at.

Okay, here goes.
The structure of the repo is the following:

jcr:root <- Node 1 <- Node 2 <- Node 3

The scenario at which I'm (sometimes) getting this corruption is the following:

- move Node 3 to be the kid of Node 1
- remove Node 2
- save session
- remove Node 3
- SS
- append Node 4 to Node 1
- SS
- append Node 5 to Node 4
- SS

- perform processing of the resulting tree, which includes reading the
tree and properties being added and removed to all nodes sequentially.
once each node is processed, session.save() is called;

During the processing, I sometimes get the following exception:

2010-01-29 21:39:50,456 DEBUG observation.ObservationDispatcher -
notifying 3 synchronous listeners.
2010-01-29 21:39:50,456 DEBUG core.SearchManager - onEvent: indexing started
2010-01-29 21:39:50,461 WARN  lucene.SearchIndex - Exception while
creating document for node: 1cea61c4-8e20-4aed-b1f2-21600ff65101:
javax.jcr.RepositoryException: Error while indexing node:
1cea61c4-8e20-4aed-b1f2-21600ff65101 of type:
{http://www.jcp.org/jcr/nt/1.0}unstructured:
1cea61c4-8e20-4aed-b1f2-21600ff65101/{}date-updated:
1cea61c4-8e20-4aed-b1f2-21600ff65101/{}date-updated

It is this exception which leads to corrupted index once the
processing is done. If it's thrown during the processing, I then have
query that looks like "//Node" (which is supposed to return 3 elements
currently representing the "tree") only return one element -- Node 1.

Perhaps it also worth noticing that besides this "Node N" elements,
there's a subtree of about 10-15MB of data in the repo, not sure if
this is relevant.

Anyway, if someone can give me any insights regarding what might cause
this issue or how can I reproduce it consistently, that would be
great.

Manually removing index files from fs datastore and restarting the app
is not a suitable workaround for me.



-- 
sp

Re: Index corruption

Posted by Sergey Podatelev <br...@gmail.com>.
On Mon, Feb 15, 2010 at 3:41 PM, Marcel Reutegger
<ma...@gmx.net> wrote:
> Hi,
>
> On Sat, Feb 13, 2010 at 16:41, Sergey Podatelev
> <br...@gmail.com> wrote:

>> My environment is a web application where
>> jackrabbit is used as an application bundle, and it's configured for
>> distributed transactions support along with mysql connector using
>> Jencks.
>
> hmm, maybe this is related to the distributed transaction... does it
> also happen when you are *not* using XA?
>

Once I removed XA configuration for objects related, I haven't been
able to reproduce that exception, so it seems that XA is involved
indeed.
Do you need any specifics on my configuration?

-- 
sp

Re: Index corruption

Posted by Marcel Reutegger <ma...@gmx.net>.
Hi,

On Sat, Feb 13, 2010 at 16:41, Sergey Podatelev
<br...@gmail.com> wrote:
> Marcel, I'm having troubles providing a reliable test-case for this,
> since in actual production environment this happens only about one
> time out of six, and the environment itself is a pretty complex one.
>
> Can you perhaps give me some pointers on what might be a cause for
> this one except multi-threaded access or what could increase the
> probability of the error? My environment is a web application where
> jackrabbit is used as an application bundle, and it's configured for
> distributed transactions support along with mysql connector using
> Jencks.

hmm, maybe this is related to the distributed transaction... does it
also happen when you are *not* using XA?

regards
 marcel

> Like I said, the error happens when there're a lot of consecutive
> repository writes with calls to session.save() upon each write,
> however, they're all triggered by a single web request.
>
> On Mon, Feb 8, 2010 at 11:19 AM, Marcel Reutegger
> <ma...@gmx.net> wrote:
>> Hi,
>>
>> 2010/2/5 Sergey Podatelev <br...@gmail.com>:
>>> Thanks for your reply, Marcel.
>>>
>>> On Thu, Feb 4, 2010 at 8:12 PM, Marcel Reutegger
>>> <ma...@gmx.net> wrote:
>>>
>>>> is that a single threaded environment or is it possible that multiple
>>>> threads share a session? that's not supported.
>>>
>>> The Session instance is only utilized by a single thread.
>>>
>>>> what's the root cause of the exception?
>>>
>>> Here goes, the silly-looking printouts are those that I put in
>>> SearchIndex myself:
>>>
>>> 2010-02-05 19:48:20,794 DEBUG observation.ObservationDispatcher -
>>> notifying 3 synchronous listeners.
>>> 2010-02-05 19:48:20,794 DEBUG core.SearchManager - onEvent: indexing started
>>> 2010-02-05 19:48:20,795 DEBUG lucene.SearchIndex - search index: updateNodes()
>>> 2010-02-05 19:48:20,796 DEBUG lucene.SearchIndex - trying to create document
>>> 2010-02-05 19:48:20,797 WARN  lucene.SearchIndex - Exception while
>>> creating document for node: 4977b098-bba2-4274-ae7d-7575d2679526:
>>> javax.jcr.RepositoryException: Error while indexing node:
>>> 4977b098-bba2-4274-ae7d-7575d2679526 of type:
>>> {http://www.jcp.org/jcr/nt/1.0}unstructured:
>>> 4977b098-bba2-4274-ae7d-7575d2679526/{}date-updated:
>>> 4977b098-bba2-4274-ae7d-7575d2679526/{}date-updated
>>>
>>> 2010-02-05 19:48:20,798 DEBUG lucene.SearchIndex - whoopsy, failed!,
>>> root cause: org.apache.jackrabbit.core.state.NoSuchItemStateException:
>>> 4977b098-bba2-4274-ae7d-7575d2679526/{}date-updated
>>
>> that looks like a node indicates that there is a data-updated
>> property, but when trying to read the property it turns out it doesn't
>> exist. that's quite strange. usually such situations may occur when
>> multiple threads access the same session instance and modify content
>> concurrently.
>>
>> are you able to provide a test case that reproduces the issue?
>>
>> regards
>>  marcel
>>
>>> 2010-02-05 19:48:20,798 DEBUG lucene.SearchIndex - trying to create document
>>> 2010-02-05 19:48:20,800 DEBUG core.SearchManager - onEvent: indexing
>>> finished in 6 ms.
>>>
>>>> are you able to re-index the workspace after such an event or is the
>>>> data in the persistence manager corrupt as well?
>>>
>>> If I just restart the app or the server, JackRabbit says the redo.log
>>> is empty, so there's nothing to re-index, thus the state of the
>>> indexes stays wrong, and I still get wrong query results.
>>>
>>> Removing the contents of $home$/index and
>>> $home$/workspaces/default/index directories and then restarting the
>>> server, restores proper index.
>>>
>>> --
>>> sp
>>>
>>
>
>
>
> --
> sp
>

Re: Index corruption

Posted by Sergey Podatelev <br...@gmail.com>.
Marcel, I'm having troubles providing a reliable test-case for this,
since in actual production environment this happens only about one
time out of six, and the environment itself is a pretty complex one.

Can you perhaps give me some pointers on what might be a cause for
this one except multi-threaded access or what could increase the
probability of the error? My environment is a web application where
jackrabbit is used as an application bundle, and it's configured for
distributed transactions support along with mysql connector using
Jencks.

Like I said, the error happens when there're a lot of consecutive
repository writes with calls to session.save() upon each write,
however, they're all triggered by a single web request.

On Mon, Feb 8, 2010 at 11:19 AM, Marcel Reutegger
<ma...@gmx.net> wrote:
> Hi,
>
> 2010/2/5 Sergey Podatelev <br...@gmail.com>:
>> Thanks for your reply, Marcel.
>>
>> On Thu, Feb 4, 2010 at 8:12 PM, Marcel Reutegger
>> <ma...@gmx.net> wrote:
>>
>>> is that a single threaded environment or is it possible that multiple
>>> threads share a session? that's not supported.
>>
>> The Session instance is only utilized by a single thread.
>>
>>> what's the root cause of the exception?
>>
>> Here goes, the silly-looking printouts are those that I put in
>> SearchIndex myself:
>>
>> 2010-02-05 19:48:20,794 DEBUG observation.ObservationDispatcher -
>> notifying 3 synchronous listeners.
>> 2010-02-05 19:48:20,794 DEBUG core.SearchManager - onEvent: indexing started
>> 2010-02-05 19:48:20,795 DEBUG lucene.SearchIndex - search index: updateNodes()
>> 2010-02-05 19:48:20,796 DEBUG lucene.SearchIndex - trying to create document
>> 2010-02-05 19:48:20,797 WARN  lucene.SearchIndex - Exception while
>> creating document for node: 4977b098-bba2-4274-ae7d-7575d2679526:
>> javax.jcr.RepositoryException: Error while indexing node:
>> 4977b098-bba2-4274-ae7d-7575d2679526 of type:
>> {http://www.jcp.org/jcr/nt/1.0}unstructured:
>> 4977b098-bba2-4274-ae7d-7575d2679526/{}date-updated:
>> 4977b098-bba2-4274-ae7d-7575d2679526/{}date-updated
>>
>> 2010-02-05 19:48:20,798 DEBUG lucene.SearchIndex - whoopsy, failed!,
>> root cause: org.apache.jackrabbit.core.state.NoSuchItemStateException:
>> 4977b098-bba2-4274-ae7d-7575d2679526/{}date-updated
>
> that looks like a node indicates that there is a data-updated
> property, but when trying to read the property it turns out it doesn't
> exist. that's quite strange. usually such situations may occur when
> multiple threads access the same session instance and modify content
> concurrently.
>
> are you able to provide a test case that reproduces the issue?
>
> regards
>  marcel
>
>> 2010-02-05 19:48:20,798 DEBUG lucene.SearchIndex - trying to create document
>> 2010-02-05 19:48:20,800 DEBUG core.SearchManager - onEvent: indexing
>> finished in 6 ms.
>>
>>> are you able to re-index the workspace after such an event or is the
>>> data in the persistence manager corrupt as well?
>>
>> If I just restart the app or the server, JackRabbit says the redo.log
>> is empty, so there's nothing to re-index, thus the state of the
>> indexes stays wrong, and I still get wrong query results.
>>
>> Removing the contents of $home$/index and
>> $home$/workspaces/default/index directories and then restarting the
>> server, restores proper index.
>>
>> --
>> sp
>>
>



-- 
sp

Re: Index corruption

Posted by Marcel Reutegger <ma...@gmx.net>.
Hi,

2010/2/5 Sergey Podatelev <br...@gmail.com>:
> Thanks for your reply, Marcel.
>
> On Thu, Feb 4, 2010 at 8:12 PM, Marcel Reutegger
> <ma...@gmx.net> wrote:
>
>> is that a single threaded environment or is it possible that multiple
>> threads share a session? that's not supported.
>
> The Session instance is only utilized by a single thread.
>
>> what's the root cause of the exception?
>
> Here goes, the silly-looking printouts are those that I put in
> SearchIndex myself:
>
> 2010-02-05 19:48:20,794 DEBUG observation.ObservationDispatcher -
> notifying 3 synchronous listeners.
> 2010-02-05 19:48:20,794 DEBUG core.SearchManager - onEvent: indexing started
> 2010-02-05 19:48:20,795 DEBUG lucene.SearchIndex - search index: updateNodes()
> 2010-02-05 19:48:20,796 DEBUG lucene.SearchIndex - trying to create document
> 2010-02-05 19:48:20,797 WARN  lucene.SearchIndex - Exception while
> creating document for node: 4977b098-bba2-4274-ae7d-7575d2679526:
> javax.jcr.RepositoryException: Error while indexing node:
> 4977b098-bba2-4274-ae7d-7575d2679526 of type:
> {http://www.jcp.org/jcr/nt/1.0}unstructured:
> 4977b098-bba2-4274-ae7d-7575d2679526/{}date-updated:
> 4977b098-bba2-4274-ae7d-7575d2679526/{}date-updated
>
> 2010-02-05 19:48:20,798 DEBUG lucene.SearchIndex - whoopsy, failed!,
> root cause: org.apache.jackrabbit.core.state.NoSuchItemStateException:
> 4977b098-bba2-4274-ae7d-7575d2679526/{}date-updated

that looks like a node indicates that there is a data-updated
property, but when trying to read the property it turns out it doesn't
exist. that's quite strange. usually such situations may occur when
multiple threads access the same session instance and modify content
concurrently.

are you able to provide a test case that reproduces the issue?

regards
 marcel

> 2010-02-05 19:48:20,798 DEBUG lucene.SearchIndex - trying to create document
> 2010-02-05 19:48:20,800 DEBUG core.SearchManager - onEvent: indexing
> finished in 6 ms.
>
>> are you able to re-index the workspace after such an event or is the
>> data in the persistence manager corrupt as well?
>
> If I just restart the app or the server, JackRabbit says the redo.log
> is empty, so there's nothing to re-index, thus the state of the
> indexes stays wrong, and I still get wrong query results.
>
> Removing the contents of $home$/index and
> $home$/workspaces/default/index directories and then restarting the
> server, restores proper index.
>
> --
> sp
>

Re: Index corruption

Posted by Sergey Podatelev <br...@gmail.com>.
Thanks for your reply, Marcel.

On Thu, Feb 4, 2010 at 8:12 PM, Marcel Reutegger
<ma...@gmx.net> wrote:

> is that a single threaded environment or is it possible that multiple
> threads share a session? that's not supported.

The Session instance is only utilized by a single thread.

> what's the root cause of the exception?

Here goes, the silly-looking printouts are those that I put in
SearchIndex myself:

2010-02-05 19:48:20,794 DEBUG observation.ObservationDispatcher -
notifying 3 synchronous listeners.
2010-02-05 19:48:20,794 DEBUG core.SearchManager - onEvent: indexing started
2010-02-05 19:48:20,795 DEBUG lucene.SearchIndex - search index: updateNodes()
2010-02-05 19:48:20,796 DEBUG lucene.SearchIndex - trying to create document
2010-02-05 19:48:20,797 WARN  lucene.SearchIndex - Exception while
creating document for node: 4977b098-bba2-4274-ae7d-7575d2679526:
javax.jcr.RepositoryException: Error while indexing node:
4977b098-bba2-4274-ae7d-7575d2679526 of type:
{http://www.jcp.org/jcr/nt/1.0}unstructured:
4977b098-bba2-4274-ae7d-7575d2679526/{}date-updated:
4977b098-bba2-4274-ae7d-7575d2679526/{}date-updated

2010-02-05 19:48:20,798 DEBUG lucene.SearchIndex - whoopsy, failed!,
root cause: org.apache.jackrabbit.core.state.NoSuchItemStateException:
4977b098-bba2-4274-ae7d-7575d2679526/{}date-updated

2010-02-05 19:48:20,798 DEBUG lucene.SearchIndex - trying to create document
2010-02-05 19:48:20,800 DEBUG core.SearchManager - onEvent: indexing
finished in 6 ms.

> are you able to re-index the workspace after such an event or is the
> data in the persistence manager corrupt as well?

If I just restart the app or the server, JackRabbit says the redo.log
is empty, so there's nothing to re-index, thus the state of the
indexes stays wrong, and I still get wrong query results.

Removing the contents of $home$/index and
$home$/workspaces/default/index directories and then restarting the
server, restores proper index.

-- 
sp

Re: Index corruption

Posted by Marcel Reutegger <ma...@gmx.net>.
Hi,

is that a single threaded environment or is it possible that multiple
threads share a session? that's not supported.

what's the root cause of the exception?

are you able to re-index the workspace after such an event or is the
data in the persistence manager corrupt as well?

regards
 marcel

2010/2/2 Sergey Podatelev <br...@gmail.com>:
> Hello,
>
> I'm using Jackrabbit 1.5.3 and sometimes I run into index corruption
> problem. I can't consistently reproduce the issue, so upgrading to 1.6
> or 2.0 is not an option for me, as I'm not sure my exact issue is
> solved there.
> I'm also not in the liberty of exposing my actual repository
> structure, so what I'm asking here is some pointers where exactly too
> look at.
>
> Okay, here goes.
> The structure of the repo is the following:
>
> jcr:root <- Node 1 <- Node 2 <- Node 3
>
> The scenario at which I'm (sometimes) getting this corruption is the following:
>
> - move Node 3 to be the kid of Node 1
> - remove Node 2
> - save session
> - remove Node 3
> - SS
> - append Node 4 to Node 1
> - SS
> - append Node 5 to Node 4
> - SS
>
> - perform processing of the resulting tree, which includes reading the
> tree and properties being added and removed to all nodes sequentially.
> once each node is processed, session.save() is called;
>
> During the processing, I sometimes get the following exception:
>
> 2010-01-29 21:39:50,456 DEBUG observation.ObservationDispatcher -
> notifying 3 synchronous listeners.
> 2010-01-29 21:39:50,456 DEBUG core.SearchManager - onEvent: indexing started
> 2010-01-29 21:39:50,461 WARN  lucene.SearchIndex - Exception while
> creating document for node: 1cea61c4-8e20-4aed-b1f2-21600ff65101:
> javax.jcr.RepositoryException: Error while indexing node:
> 1cea61c4-8e20-4aed-b1f2-21600ff65101 of type:
> {http://www.jcp.org/jcr/nt/1.0}unstructured:
> 1cea61c4-8e20-4aed-b1f2-21600ff65101/{}date-updated:
> 1cea61c4-8e20-4aed-b1f2-21600ff65101/{}date-updated
>
> It is this exception which leads to corrupted index once the
> processing is done. If it's thrown during the processing, I then have
> query that looks like "//Node" (which is supposed to return 3 elements
> currently representing the "tree") only return one element -- Node 1.
>
> Perhaps it also worth noticing that besides this "Node N" elements,
> there's a subtree of about 10-15MB of data in the repo, not sure if
> this is relevant.
>
> Anyway, if someone can give me any insights regarding what might cause
> this issue or how can I reproduce it consistently, that would be
> great.
>
> Manually removing index files from fs datastore and restarting the app
> is not a suitable workaround for me.
>
>
>
> --
> sp
>