You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Olivier Dony <ol...@denali.be> on 2009/03/31 00:30:35 UTC

Months of repository data lost after jackrabbit-server restart?

Hello everyone,

We've been running a web application using a jackrabbit-1.1 server  
accessed through RMI for a couple of years.
The repository is using a MySQL-based SimpleDbPersistenceManager, and  
the LocalFileSystem FS.

It has been running flawlessly until now, so we've seen no need to  
upgrade the jackrabbit-server.

Today we had to restart the jackrabbit-server for a maintenance  
operation. After the restart, we noticed that most of the repository  
was gone, with thousands of UUIDs missing. Our queries now return  
exceptions such as:

   javax.jcr.RepositoryException: failed to retrieve state of  
intermediary node: cc4b7c66-de69-47d3-a573-4a6dc58f42d1

We've examined the jackrabbit-server logs, and indeed they were kind  
of filled with PM "broken pipe" exceptions, over and over (logs  
included at the end). We had experienced this kind of MySQL no- 
activity disconnection before, but it has never been silent, and we  
used to notice when the repository was not accessible anymore (many  
ItemNotFoundExceptions)
In this case however, it looks like the repository continued to work,  
so we never noticed anything wrong. Everything was ok: creating nodes,  
uploading files, adding meta-data, versioning, reading, etc.
The "broken pipe" exceptions are the only noteworthy errors we can see  
in the log since the last restart of the jackrabbit-server, in  
december 2008.

How is this possible? Was the jackrabbit-server caching all the  
metadata without ever needing the database?
It's frightening because the MySQL database files for the jackrabbit  
db have a modification time dating back in June 2008! The jackrabbit- 
server has been restarted since then, and it was still working, so  
what is happening?


We've forced a consistency check, and it gave thousands of the  
messages similar to these (with varying UUIDs):

   *INFO * ConsistencyCheck: Removing deleted node from index:  
6b446f35-3fda-4c2c-b0fd-1b95910fde8e (ConsistencyCheck.java, line 386)
   (...)
   *WARN * ConsistencyCheck: Not repairable: Node 42e763b8-4ee0-4771- 
a0c1-844b8099fef7 has unknown parent: 70591d10-6887-4644- 
af24-7fc3b6a5478f (ConsistencyCheck.java, line 118)
   (...)

followed by this dramatic conclusion:

   *INFO * ConsistencyCheck: Repaired 5174 errors.  
(ConsistencyCheck.java, line 132)
   *WARN * ConsistencyCheck: 4995 error(s) not repairable.  
(ConsistencyCheck.java, line 134)


Now the real question is:  what can we do to recover even a part of  
our lost data? We do have incremental backups of the MySQL and the  
filesystem structure, but it looks like the MySQL database had indeed  
not been updated since June 2008, so our backups can't help...
The filesystem structure on the disk probably contains all blobs, but  
is it possible to retrieve anything in that cryptic tree structure?  
For example, is there a way to locate the jcr:content property of a  
node, based on a known (missing) UUID?

Does anyone have any advice to offer?
Anything else we need to look for in our logs or data?

Thanks a lot in advance!


--
Olivier Dony





------ EXCEPTIONS:

  *WARN * DatabasePersistenceManager: execute failed, about to  
reconnect... (DatabasePersistenceManager.java, line 843)
  *ERROR* DatabasePersistenceManager: failed closing Statement  
(DatabasePersistenceManager.java, line 925)
  *ERROR* DatabasePersistenceManager:     reason: Communications link  
failure due to underlying exception:

     ** BEGIN NESTED EXCEPTION **

     java.net.SocketException
     MESSAGE: Broken pipe

     STACKTRACE:

     java.net.SocketException: Broken pipe
           at java.net.SocketOutputStream.socketWrite0(Native Method)
           at  
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
           at  
java.net.SocketOutputStream.write(SocketOutputStream.java:136)
           at  
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
           at  
java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
           at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:2739)
           at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:2650)
           at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1581)
           at  
com 
.mysql 
.jdbc.ServerPreparedStatement.realClose(ServerPreparedStatement.java: 
897)
           at  
com 
.mysql.jdbc.ServerPreparedStatement.close(ServerPreparedStatement.java: 
463)
           at  
org 
.apache 
.jackrabbit 
.core 
.persistence 
.db 
.DatabasePersistenceManager 
.closeStatement(DatabasePersistenceManager.java:916)
           at  
org 
.apache 
.jackrabbit 
.core 
.persistence 
.db 
.DatabasePersistenceManager 
.reestablishConnection(DatabasePersistenceManager.java:781)
           at  
org 
.apache 
.jackrabbit 
.core 
.persistence 
.db 
.DatabasePersistenceManager 
.executeStmt(DatabasePersistenceManager.java:846)
           at  
org 
.apache 
.jackrabbit 
.core 
.persistence 
.db.DatabasePersistenceManager.exists(DatabasePersistenceManager.java: 
647)
           at  
org 
.apache 
.jackrabbit 
.core 
.state 
.SharedItemStateManager 
.hasNonVirtualItemState(SharedItemStateManager.java:1050)
           at  
org 
.apache 
.jackrabbit 
.core 
.state.SharedItemStateManager.hasItemState(SharedItemStateManager.java: 
287)
           at  
org 
.apache 
.jackrabbit 
.core 
.state.LocalItemStateManager.hasItemState(LocalItemStateManager.java: 
180)
           at  
org 
.apache 
.jackrabbit 
.core.state.XAItemStateManager.hasItemState(XAItemStateManager.java:252)
           at  
org 
.apache 
.jackrabbit 
.core 
.state 
.SessionItemStateManager.getItemState(SessionItemStateManager.java:163)
           at  
org 
.apache 
.jackrabbit 
.core.HierarchyManagerImpl.getItemState(HierarchyManagerImpl.java:120)
           at  
org 
.apache 
.jackrabbit 
.core.HierarchyManagerImpl.resolvePath(HierarchyManagerImpl.java:267)
           at  
org 
.apache 
.jackrabbit 
.core.CachingHierarchyManager.resolvePath(CachingHierarchyManager.java: 
132)
           at  
org 
.apache 
.jackrabbit 
.core.HierarchyManagerImpl.resolvePath(HierarchyManagerImpl.java:267)
           at  
org 
.apache 
.jackrabbit 
.core.CachingHierarchyManager.resolvePath(CachingHierarchyManager.java: 
132)
           at  
org 
.apache 
.jackrabbit 
.core.HierarchyManagerImpl.resolvePath(HierarchyManagerImpl.java:267)
           at  
org 
.apache 
.jackrabbit 
.core.CachingHierarchyManager.resolvePath(CachingHierarchyManager.java: 
132)
           at  
org 
.apache 
.jackrabbit 
.core.HierarchyManagerImpl.resolvePath(HierarchyManagerImpl.java:203)
           at  
org 
.apache 
.jackrabbit 
.core.HierarchyManagerImpl.resolvePath(HierarchyManagerImpl.java:341)
           at  
org 
.apache 
.jackrabbit 
.core.CachingHierarchyManager.resolvePath(CachingHierarchyManager.java: 
193)
           at  
org 
.apache.jackrabbit.core.NodeImpl.resolveRelativeNodePath(NodeImpl.java: 
243)
           at  
org.apache.jackrabbit.core.NodeImpl.getNode(NodeImpl.java:2412)
           at  
org.apache.jackrabbit.rmi.server.ServerNode.getNode(ServerNode.java:227)
           at sun.reflect.GeneratedMethodAccessor1727.invoke(Unknown  
Source)
           at  
sun 
.reflect 
.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 
25)
           at java.lang.reflect.Method.invoke(Method.java:597)
           at  
sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)


    Last packet sent to the server was 1 ms ago.  
(DatabasePersistenceManager.java, line 927)
    *ERROR* DatabasePersistenceManager: state/code: 08S01/0  
(DatabasePersistenceManager.java, line 928)
    *ERROR* DatabasePersistenceManager: failed closing Statement  
(DatabasePersistenceManager.java, line 925)
    *ERROR* DatabasePersistenceManager:     reason: Communications  
link failure due to underlying exception:
    ** BEGIN NESTED EXCEPTION **
    java.net.SocketException
    MESSAGE: Broken pipe
    STACKTRACE:
    java.net.SocketException: Broken pipe
      at java.net.SocketOutputStream.socketWrite0(Native Method)
      at  
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
      at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
      at  
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
      at java.io.BufferedOutputStream.flush(BufferedOutputStream.java: 
123)
      at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:2739)
      at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:2650)
      at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1581)
      at  
com 
.mysql 
.jdbc.ServerPreparedStatement.realClose(ServerPreparedStatement.java: 
897)
      at  
com 
.mysql.jdbc.ServerPreparedStatement.close(ServerPreparedStatement.java: 
463)
      at  
org 
.apache 
.jackrabbit 
.core 
.persistence 
.db 
.DatabasePersistenceManager 
.closeStatement(DatabasePersistenceManager.java:916)
      at  
org 
.apache 
.jackrabbit 
.core 
.persistence 
.db 
.DatabasePersistenceManager 
.reestablishConnection(DatabasePersistenceManager.java:781)
      at  
org 
.apache 
.jackrabbit 
.core 
.persistence 
.db 
.DatabasePersistenceManager 
.executeStmt(DatabasePersistenceManager.java:846)
      at  
org 
.apache 
.jackrabbit 
.core 
.persistence 
.db.DatabasePersistenceManager.exists(DatabasePersistenceManager.java: 
647)
      at  
org 
.apache 
.jackrabbit 
.core 
.state 
.SharedItemStateManager 
.hasNonVirtualItemState(SharedItemStateManager.java:1050)
      at  
org 
.apache 
.jackrabbit 
.core 
.state.SharedItemStateManager.hasItemState(SharedItemStateManager.java: 
287)
      at  
org 
.apache 
.jackrabbit 
.core 
.state.LocalItemStateManager.hasItemState(LocalItemStateManager.java: 
180)
      at  
org 
.apache 
.jackrabbit 
.core.state.XAItemStateManager.hasItemState(XAItemStateManager.java:252)
      at  
org 
.apache 
.jackrabbit 
.core 
.state 
.SessionItemStateManager.getItemState(SessionItemStateManager.java:163)
      at  
org 
.apache 
.jackrabbit 
.core.HierarchyManagerImpl.getItemState(HierarchyManagerImpl.java:120)
      at  
org 
.apache 
.jackrabbit 
.core.HierarchyManagerImpl.resolvePath(HierarchyManagerImpl.java:267)
      at  
org 
.apache 
.jackrabbit 
.core.CachingHierarchyManager.resolvePath(CachingHierarchyManager.java: 
132)
      at  
org 
.apache 
.jackrabbit 
.core.HierarchyManagerImpl.resolvePath(HierarchyManagerImpl.java:203)
      at  
org 
.apache 
.jackrabbit 
.core.CachingHierarchyManager.resolvePath(CachingHierarchyManager.java: 
200)
      at  
org 
.apache.jackrabbit.core.NodeImpl.resolveRelativeNodePath(NodeImpl.java: 
243)
      at org.apache.jackrabbit.core.NodeImpl.getNode(NodeImpl.java:2412)
      at  
org.apache.jackrabbit.rmi.server.ServerNode.getNode(ServerNode.java:227)
      at sun.reflect.GeneratedMethodAccessor1718.invoke(Unknown Source)
      at  
sun 
.reflect 
.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 
25)
      at java.lang.reflect.Method.invoke(Method.java:597)


Re: [SOLVED] Months of repository data lost after jackrabbit-server restart?

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Tue, Mar 31, 2009 at 11:42 PM, Olivier Dony <ol...@denali.be> wrote:
> Sorry for the late answer, at the end of a long day of investigation ;-)
> First, thank you for the quick reply, your explanations put us on the
> right track.
>
> The short version is: it seems the culprit was in fact the MySQL database,
> although we still don't understand how that's even possible.
> [...]
> And yes, after restarting jackrabbit-server everything is back!

Excellent, thanks for sharing this!

BR,

Jukka Zitting

Re: [SOLVED] Months of repository data lost after jackrabbit-server restart?

Posted by Olivier Dony <ol...@denali.be>.
Hello Jukka,

Sorry for the late answer, at the end of a long day of investigation ;-)
First, thank you for the quick reply, your explanations put us on the
right track.

The short version is: it seems the culprit was in fact the MySQL  
database,
although we still don't understand how that's even possible.
The long version is in the text below.


On 31 Mar 2009, at 01:34, Jukka Zitting wrote:
> Seems weird. A save() call can only succeed if the underlying
> persistence database has successfully committed the changes. Does your
> log show any exceptions or problems related to save() calls?

We've analyzed the logs even further but never found any exception or  
problem
during a save operation, which is reassuring. I guess the "broken pipe"
exceptions occur after each period of database inactivity (when closing
statements?), but whenever new data needs to be saved, the PM manages to
establish a new connection.


> Jackrabbit does cache content, but the content from one session only
> becomes available to another session after it has been successfully
> persisted.

Indeed, after further tests that seems to be the case. Great to know.


> So unless you only ever used a single session, there should
> be no way for your data to be kept *only* in a cache unless the
> underlying database was seriously broken (i.e. it returned success for
> uncompleted commits).

Well, it looks like the database *was* seriously broken. The db files on
the disk had not been touched in months (making our backups useless),
however the binary logs did contain all the jackrabbit transactions!

So it looks like MySQL had been writing to /invisible/ database files
somewhere (perhaps unlinked or something), or not at all. And then
it suddenly recovered yesterday, only to synchronize with the *old* db
files on the disk. We don't know what triggered that, as the only
noticeable thing that happened to the MySQL server yesterday is that
the disk where the binary logs are  written became full for a little
while.
The MySQL server's uptime is 300+ days (and counting), and we have
never altered its config. Moreover, our other MySQL databases on the
same server did not experience the same issue.
So that's still a mystery... (MySQL 5.0.44 on Gentoo 2.6.24.2)

Fortunately we also had backups of the binary logs, so we restored older
database dumps and then replayed the whole binary logs range since June
2008 over it.
And yes, after restarting jackrabbit-server everything is back!


Thanks a lot for your help!

Cheers,

--
Olivier Dony


Re: Months of repository data lost after jackrabbit-server restart?

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Tue, Mar 31, 2009 at 12:30 AM, Olivier Dony <ol...@denali.be> wrote:
> In this case however, it looks like the repository continued to work, so we
> never noticed anything wrong. Everything was ok: creating nodes, uploading
> files, adding meta-data, versioning, reading, etc.
> The "broken pipe" exceptions are the only noteworthy errors we can see in
> the log since the last restart of the jackrabbit-server, in december 2008.
>
> How is this possible?

Seems weird. A save() call can only succeed if the underlying
persistence database has successfully committed the changes. Does your
log show any exceptions or problems related to save() calls?

> Was the jackrabbit-server caching all the metadata
> without ever needing the database?

Jackrabbit does cache content, but the content from one session only
becomes available to another session after it has been successfully
persisted. So unless you only ever used a single session, there should
be no way for your data to be kept *only* in a cache unless the
underlying database was seriously broken (i.e. it returned success for
uncompleted commits).

> It's frightening because the MySQL database files for the jackrabbit db have
> a modification time dating back in June 2008! The jackrabbit-server has been
> restarted since then, and it was still working, so what is happening?

If you did restart Jackrabbit since then and you're seeing more recent
content, then clearly that content is stored somewhere other than
Jackrabbit caches. Are you sure you haven't accidentally changed the
Jackrabbit configuration to point to some other database? Has there
been any major configuration/deployment changes in the MySQL database
you're using?

> Now the real question is:  what can we do to recover even a part of our lost
> data? We do have incremental backups of the MySQL and the filesystem
> structure, but it looks like the MySQL database had indeed not been updated
> since June 2008, so our backups can't help...

First of all make sure that your backups are safe. Whatever you do, do
it on a copy of your backups so you can't accidentally destroy things.
Note also that the Jackrabbit search indexes will contain quite a lot
of information about the repository content. In the extreme case of
all data being gone from the persistence database, you should still be
able to salvage something from the search index. If you have backups
of the search index, make sure those backups are safe.

Next, I'd investigate what really went wrong. Why did you start seeing
the pipe errors in the first place? What log messages are there for
save() operations? Has there been any recent configuration/deployment
changes or upgrades to the MySQL or Jackrabbit installations? What was
the exact Jackrabbit configuration (repository.xml + workspace.xml
files) you used?

Once you know what went wrong, you'll have a better picture of where
your content could be found.

BR,

Jukka Zitting