You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Chris Wilson <ch...@netservers.co.uk> on 2003/06/09 09:49:18 UTC

Repository broken, recover fails

Hi all,

I came into work this morning to find a weird problem. I cannot check 
anything into the repository:

  [root@chris arp_antidote]# svn ci -F ../svn-commit.tmp .
  Adding         arp_antidote/antidote2.diff
  Deleting       arp_antidote/antidote2.diff.gz
  svn: Berkeley DB error
  svn: Commit failed (details follow):
  svn: Berkeley DB error while appending string for filesystem 
    /home/svn/root/db:
  DB_RUNRECOVERY: Fatal error, run database recovery

So I ran recover:

  [root@chris arp_antidote]# svnadmin recover /home/svn/root
  Acquiring exclusive lock on repository db.
  Recovery is running, please stand by...svn: Berkeley DB error
  svn: DB_INCOMPLETE: Cache flush was unable to complete

I ran it under strace and observed:

open("/home/svn/root/db/log.0000010786", O_RDONLY|O_LARGEFILE) = 4
fcntl64(4, F_SETFD, FD_CLOEXEC)         = 0
_llseek(4, 300064, [300064], SEEK_SET)  = 0
read(4, "\350\223\4\0003\204Gv@\0\0\0001\0\0\0\324*\4\200\"*\0\0"..., 64) 
= 64
time([1055151215])                      = 1055151215
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
select(0, NULL, NULL, NULL, {2, 0})     = 0 (Timeout)
select(0, NULL, NULL, NULL, {4, 0})     = 0 (Timeout)
close(4)                                = 0
...
write(2, "svn: Berkeley DB error\n", 23svn: Berkeley DB error
) = 23

This seems very weird, I can't imagine why it sleeps for a few seconds 
before printing the error.

I ran recover several times, and at one point it claimed to have worked:

  [root@chris arp_antidote]# svnadmin recover /home/svn/root
  Acquiring exclusive lock on repository db.
  Recovery is running, please stand by...
  Recovery completed.
  The latest repos revision is 1231.

Then I can at least check stuff out. But I tried the same checkin, and it
broke the repository again.

I hope someone can help me with this problem, since it's blocking an 
important task. Also, I'm afraid to do other checkins in case they break 
it too.

Cheers, Chris.
-- 
   ___ __     _
 / __// / ,__(_)_  | Chris Wilson -- UNIX Firewall Lead Developer |
/ (_ / ,\/ _/ /_ \ | NetServers.co.uk http://www.netservers.co.uk |
\ _//_/_/_//_/___/ | 21 Signet Court, Cambridge, UK. 01223 576516 |


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Repository broken, recover fails

Posted by Chris Wilson <ch...@netservers.co.uk>.
Hi Greg,

> > I hope someone can help me with this problem, since it's blocking an 
> > important task. Also, I'm afraid to do other checkins in case they break 
> > it too.
> 
> You're doing your recovery as 'root'. Are you sure that you've reset all of
> the ownership/group values back? You could be messing up your database by
> allowing some files to be written, but not others.

As many times as I remembered (at least once), I did a recursive chown of 
the repository after svnadmin recover, and it was still broken. But thanks 
for the reminder =)

Cheers, Chris.
-- 
   ___ __     _
 / __// / ,__(_)_  | Chris Wilson -- UNIX Firewall Lead Developer |
/ (_ / ,\/ _/ /_ \ | NetServers.co.uk http://www.netservers.co.uk |
\ _//_/_/_//_/___/ | 21 Signet Court, Cambridge, UK. 01223 576516 |


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Repository broken, recover fails

Posted by Greg Stein <gs...@lyra.org>.
On Mon, Jun 09, 2003 at 10:49:18AM +0100, Chris Wilson wrote:
>...
> I ran recover several times, and at one point it claimed to have worked:
> 
>   [root@chris arp_antidote]# svnadmin recover /home/svn/root
>   Acquiring exclusive lock on repository db.
>   Recovery is running, please stand by...
>   Recovery completed.
>   The latest repos revision is 1231.
> 
> Then I can at least check stuff out. But I tried the same checkin, and it
> broke the repository again.
> 
> I hope someone can help me with this problem, since it's blocking an 
> important task. Also, I'm afraid to do other checkins in case they break 
> it too.

You're doing your recovery as 'root'. Are you sure that you've reset all of
the ownership/group values back? You could be messing up your database by
allowing some files to be written, but not others.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Repository broken, recover fails

Posted by Chris Wilson <ch...@netservers.co.uk>.
Hi there,

> Unfortunately it's a bit difficult at the moment. I upgraded from 
> 0.17 to 0.23 in the hope of fixing the problem, and discovered that the 
> new version can't read my old repository. So I'm in the middle of 
> restoring it from a dump.

Having upgraded to 0.23 and rebuilt the repository from the dump, the 
problem seems to have fixed itself. However, I'm still a little worried 
that the same problem may occur again in future, since it took a whole day 
to recover from this one =(

Cheers, Chris.
-- 
   ___ __     _
 / __// / ,__(_)_  | Chris Wilson -- UNIX Firewall Lead Developer |
/ (_ / ,\/ _/ /_ \ | NetServers.co.uk http://www.netservers.co.uk |
\ _//_/_/_//_/___/ | 21 Signet Court, Cambridge, UK. 01223 576516 |


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Repository broken, recover fails

Posted by Chris Wilson <ch...@netservers.co.uk>.
Hi there,

> /me frantically tries to hack into gstein and ghudson's accounts
> and remove this mail before they get it :-)

Hehe =)

> That would be ra_dav.  Can you check your Apache error_log and see if
> there are any segfaults happening?  If so, shutdown Apache, and then
> if you are familiar with gdb, try to just run 'httpd -X' under gdb,
> repeat your commit, and see if gdb halts on a SEGFAULT or something.

Unfortunately it's a bit difficult at the moment. I upgraded from 
0.17 to 0.23 in the hope of fixing the problem, and discovered that the 
new version can't read my old repository. So I'm in the middle of 
restoring it from a dump.

Sorry for not mentioning that I was using an old version, I guess this
problem might already be fixed. If not, then I'll try to run Apache under
gdb and report what I find.

I found some messages in the Apache error logs that might explain things 
to someone:

[Mon Jun 09 10:29:14 2003] [error] [client 127.0.0.1] Could not DELETE 
  /svn/!svn/wrk/04a6b279-a7bf-0310-9261-8f56e76b0230/branch/firerack/teql/kernel/sources/arp_antidote/antidote2.diff.gz.  [500, #0]                                        
[Mon Jun 09 10:29:14 2003] [error] [client 127.0.0.1] Could not delete the 
  resource.  [500, #160029]                                                            
[Mon Jun 09 10:29:14 2003] [error] [client 127.0.0.1] (17)File exists: 
  Berkeley DB error while appending string for filesystem 
  /home/svn/root/db:
DB_RUNRECOVERY: Fatal error, run database recovery [500, #160029]
[Mon Jun 09 10:29:14 2003] [error] [client 127.0.0.1] (20014)Error string 
  not specified yet: Berkeley DB error while closing `nodes' database for 
  filesystem /home/svn/root/db:
DB_RUNRECOVERY: Fatal error, run database recovery

However, there are no segfaults or other apparent crashes of Apache, 
except warnings like:

[Mon Jun 09 10:33:54 2003] [warn] child process 1706 still did not exit, 
  sending a SIGTERM

Cheers, Chris.
-- 
   ___ __     _
 / __// / ,__(_)_  | Chris Wilson -- UNIX Firewall Lead Developer |
/ (_ / ,\/ _/ /_ \ | NetServers.co.uk http://www.netservers.co.uk |
\ _//_/_/_//_/___/ | 21 Signet Court, Cambridge, UK. 01223 576516 |



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Repository broken, recover fails

Posted by cm...@collab.net.
Chris Wilson <ch...@netservers.co.uk> writes:

> > Is this over ra_local, ra_dav, or ra_svn?
> 
> Using Apache, which I guess is ra_svn?

/me frantically tries to hack into gstein and ghudson's accounts
and remove this mail before they get it :-)

That would be ra_dav.  Can you check your Apache error_log and see if
there are any segfaults happening?  If so, shutdown Apache, and then
if you are familiar with gdb, try to just run 'httpd -X' under gdb,
repeat your commit, and see if gdb halts on a SEGFAULT or something.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Repository broken, recover fails

Posted by Chris Wilson <ch...@netservers.co.uk>.
Hi there,

> (If you aren't doing so already, always make sure that the recovery
> process is the only one accessing the repository.)

Yes, I shut down Apache before repairing.

> Now, if this is repeatable, can you debug? 

Where do I start? I'm not familiar with BDB nor the Subversion source.

> Do you have any
> post-commit hooks that access the repository that might be crashing?

No hooks at present.

> Is this over ra_local, ra_dav, or ra_svn?

Using Apache, which I guess is ra_svn?

Cheers, Chris.
-- 
   ___ __     _
 / __// / ,__(_)_  | Chris Wilson -- UNIX Firewall Lead Developer |
/ (_ / ,\/ _/ /_ \ | NetServers.co.uk http://www.netservers.co.uk |
\ _//_/_/_//_/___/ | 21 Signet Court, Cambridge, UK. 01223 576516 |


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Repository broken, recover fails

Posted by cm...@collab.net.
Chris Wilson <ch...@netservers.co.uk> writes:

> I ran recover several times, and at one point it claimed to have worked:
> 
>   [root@chris arp_antidote]# svnadmin recover /home/svn/root
>   Acquiring exclusive lock on repository db.
>   Recovery is running, please stand by...
>   Recovery completed.
>   The latest repos revision is 1231.
> 
> Then I can at least check stuff out. But I tried the same checkin, and it
> broke the repository again.

(If you aren't doing so already, always make sure that the recovery
process is the only one accessing the repository.)

Now, if this is repeatable, can you debug?  Do you have any
post-commit hooks that access the repository that might be crashing?
Is this over ra_local, ra_dav, or ra_svn?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org