You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Petr Sebor <pe...@scssoft.com> on 2002/11/26 23:49:30 UTC

database destroyed

Hello,

yup, the subject says it all and it is not for the first time it 
happened to me.
I am having a cron script setup to make an automated backup of the database
via the 'svnadmin dump ...'

When (just by accident) someone operates with the database - co/up, while
dumping the database, it somehow gets always corrupt (so I have to run
db_recover on it) or worse, the db/__db.001 ... 005 simply just disappear.
The only cure I know is to svnadmin load the dump back in to regain
working repository :-(

and I am really not sure, why the 'svnadmin dump' needs RW access to the 
database?

This is from the error.log when I was doing a whole tree checkout and 
somewhere in
the middle cron started dumping the database. The dump finished 
successfuly, but in
the moment it ended, everything went south....

[Wed Nov 27 00:06:48 2002] [error] [client 212.20.99.170] Could not fetch resource information.  [500, #0]
[Wed Nov 27 00:06:48 2002] [error] [client 212.20.99.170] Could not open the SVN filesystem at /devel/svn/repos  [500, #160029]
[Wed Nov 27 00:06:48 2002] [error] [client 212.20.99.170] (17)File exists: Berkeley DB error while opening environment for filesystem /devel/s
vn/repos/db:
DB_RUNRECOVERY: Fatal error, run database recovery  [500, #160029]

after that, the ___db.00[1-5] files were missing in action :-(

Apache 2.0.43, Subversion 0.14.5-1 Debian package, Linux i686

Anyone experienced this? I am really clueless what could be wrong :-(

Regards,
Petr



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: database destroyed

Posted by Ben Collins-Sussman <su...@collab.net>.
Petr Sebor <pe...@scssoft.com> writes:

> >
> >
> >That makes no sense... the db files *disappear*?  Dumping the database
> >is just another process that is reading data via libsvn_fs.  We should
> >have no problem with concurrent db readers.  In fact, we expect
> >multiple httpd child processes to do this all the time.
> >
> Makes no sense to me as well... I just got this error with a notice
> "you have to do DB_RECOVER"... so I did. And even though db_recover
> gives me no error, the __db.* files are simply gone. I don't know how
> this happened, but it is not for the first time.

Did you run 'db_recover -ve -h repository/db/' ?  Or did you run
'svnadmin recover repository'?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: database destroyed

Posted by Peter Davis <pe...@pdavis.cx>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 27 November 2002 15:40, Ben Collins-Sussman wrote:
> Acquiring exclusive lock on repository db, and running recovery procedures.
> Please stand by...
[...]
> I think the problem here is that the actual recovery phase isn't
> giving the user any feedback that it's working.  :-(

Yes, I've definitely been bitten by this same problem before.  At least 
"Please stand by..." should be preceded by "Lock aquired; Recovering..." or 
something.  As it is, it sounds like it is telling me to stand by while it 
aquires the lock.

By the way, what is supposed to happen if recovery is aborted mid-process?

- -- 
Peter Davis
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE95Vq0hDAgUT1yirARAggKAJ9uLpFGsyqRZBDcPPG9DHnqcOIGGACbBaf4
WrLiL4JCLumFWsYVeVVY5X8=
=qdrc
-----END PGP SIGNATURE-----


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: database destroyed

Posted by Petr Sebor <pe...@scssoft.com>.
cmpilato@collab.net wrote:

>Petr Sebor <pe...@scssoft.com> writes:
>
>  
>
>>I was just doing 'svn up' and watching the 'top' to slowly crawl to
>>the ~300MB
>>barrier.
>>
>>[Just for the reference, it is the svn 0.14.5-1 debian package]
>>    
>>
>
>Yeah, I definitely nailed a bug with those symptoms relatively
>recently.  Let us know if you see the same results with the latest
>Subversion client, once you've upgraded.  Thanks!
>
Yes,  0.15.0-1 client eats only up to ~24MB with the same data (while 
the server
ate about ~100MB of memory... 2078 files in repository). Unfortunately I 
don't
know what was the memory usage of the server with the previous version,
but still, 100MB seems a little bit too much.

Best,
Petr


Re: database destroyed

Posted by cm...@collab.net.
Petr Sebor <pe...@scssoft.com> writes:

> I was just doing 'svn up' and watching the 'top' to slowly crawl to
> the ~300MB
> barrier.
> 
> [Just for the reference, it is the svn 0.14.5-1 debian package]

Yeah, I definitely nailed a bug with those symptoms relatively
recently.  Let us know if you see the same results with the latest
Subversion client, once you've upgraded.  Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: database destroyed

Posted by Petr Sebor <pe...@scssoft.com>.
Ben Collins-Sussman wrote:

>Oh, I wasn't sure Peter was talking about the size of httpd during an
>update.  I thought he was perhaps describing issue 985, which cmpilato
>fixed.  Phoo
>
Sorry, it is getting late here, I am forgetting to provide elementary 
information :-(
The OOM was at the client-side. I saw the apache to consume all memory
with an older SVN, but that hasn't happened since... maybe I am just 
lucky though.

I was just doing 'svn up' and watching the 'top' to slowly crawl to the 
~300MB
barrier.

[Just for the reference, it is the svn 0.14.5-1 debian package]

Best,
Petr


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: database destroyed

Posted by Ben Collins-Sussman <su...@collab.net>.
Philip Martin <ph...@codematters.co.uk> writes:

> > Stop using svn 0.14.X.  These memory bugs were fixed in 0.15, and
> > we're about to release 0.16 next week.
> 
> If Petr is referring to httpd running out of memory, then as far as I
> know it's not fixed.  302MB is almost exactly what I reported for 1200
> files in issue 773
> 
> http://subversion.tigris.org/issues/show_bug.cgi?id=773

Oh, I wasn't sure Peter was talking about the size of httpd during an
update.  I thought he was perhaps describing issue 985, which cmpilato
fixed.  Phoo.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: database destroyed

Posted by Philip Martin <ph...@codematters.co.uk>.
Ben Collins-Sussman <su...@collab.net> writes:

> > Another strange thing is that I am getting OOM quite often on update
> > process... last time I tried updating a directory of 1136 files
> > (total of ~5.6MB of sources), according to 'top' it ate 302MB of
> > memory. (the update was getting a whole new directory with that
> > number of files)... on a machine with 256MB of physical memory and
> > 128MB swap, I am simply unable to update my source tree :-) I had to
> > shot down KDE & X to get somewhere and even then it was close :-)
> 
> Stop using svn 0.14.X.  These memory bugs were fixed in 0.15, and
> we're about to release 0.16 next week.

If Petr is referring to httpd running out of memory, then as far as I
know it's not fixed.  302MB is almost exactly what I reported for 1200
files in issue 773

http://subversion.tigris.org/issues/show_bug.cgi?id=773

It doesn't appear to be simple to fix :-(

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: database destroyed

Posted by Petr Sebor <pe...@scssoft.com>.
On Wed, Nov 27, 2002 at 05:40:45PM -0600, Ben Collins-Sussman wrote:
> Petr Sebor <pe...@scssoft.com> writes:
> 
> Did you see the "Please stand by..." just sit there, and then assumed
> it was hung?  It wasn't.  It was actually running recovery, which can
> take anywhere from 5 seconds to 5 minutes.  'svnadmin recover' never
> waits on an exclusive lock.  It *assumes* that you, the user, are 100%
> certain that no other processes are accessing the db, and then
> *forcibly* removes any locks sitting in the repository before starting
> recovery.
> 
> I think the problem here is that the actual recovery phase isn't
> giving the user any feedback that it's working.  :-(

Well, gave it about 15 minutes once.. nothing. strace svnadmin stopped
at a function that tried to lock the db.lock file, but no one was
accessing the repository at that time)

I know the svnadmin shows no output, but I am being patient. When
top shows there is an activity, I'll wait waiting happily for it
to finish. Thats not the problem...

> Stop using svn 0.14.X.  These memory bugs were fixed in 0.15, and
> we're about to release 0.16 next week.

Oops... yes, I am still using the prepackaged debian 0.14.5-1

I'll either wait for the 0.16 package or try the latest from 
CVS.. oops .. SVN :-)

Thanks,
Petr

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: database destroyed

Posted by Ben Collins-Sussman <su...@collab.net>.
Petr Sebor <pe...@scssoft.com> writes:

> I don't know why, but I am unable to use svnadmin recover most of
> the time. I says it is waiting for the exclusive lock and if not
> manually killed, it would wait forever it seems. But there is no one
> sitting at that lock according to lsof (hope that locks/db.lock is
> the proper file) and running the recovery by hand succeeds.

'svnadmin recover' shows this output:

Acquiring exclusive lock on repository db, and running recovery procedures.
Please stand by...
Recovery completed.


Did you see the "Please stand by..." just sit there, and then assumed
it was hung?  It wasn't.  It was actually running recovery, which can
take anywhere from 5 seconds to 5 minutes.  'svnadmin recover' never
waits on an exclusive lock.  It *assumes* that you, the user, are 100%
certain that no other processes are accessing the db, and then
*forcibly* removes any locks sitting in the repository before starting
recovery.

I think the problem here is that the actual recovery phase isn't
giving the user any feedback that it's working.  :-(

> Another strange thing is that I am getting OOM quite often on update
> process... last time I tried updating a directory of 1136 files
> (total of ~5.6MB of sources), according to 'top' it ate 302MB of
> memory. (the update was getting a whole new directory with that
> number of files)... on a machine with 256MB of physical memory and
> 128MB swap, I am simply unable to update my source tree :-) I had to
> shot down KDE & X to get somewhere and even then it was close :-)

Stop using svn 0.14.X.  These memory bugs were fixed in 0.15, and
we're about to release 0.16 next week.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: database destroyed

Posted by Petr Sebor <pe...@scssoft.com>.
cmpilato@collab.net wrote:

>db_recover without the -e will destroy the __db.* files.  You should
>probably just just 'svnadmin recover /path/to/repos', or if your
>Subversion rig predates the 'svnadmin recover' command, run
>'db_recover -veh /path/to/repos/db'.
>  
>
Looks like it was my fault then. I wasn't avare of the '-e' option...
It has happened to me sometime around 3:00am and what I probably
did was a db4.0_recover (debian) first and then noticed the ___db*
files are gone. Sorry for confusion...

I don't know why, but I am unable to use svnadmin recover most
of the time. I says it is waiting for the exclusive lock and if not manually
killed, it would wait forever it seems. But there is no one sitting at 
that lock
according to lsof (hope that locks/db.lock is the proper file) and running
the recovery by hand succeeds.

Another strange thing is that I am getting OOM quite often on update
process... last time I tried updating a directory of 1136 files (total 
of ~5.6MB
of sources), according to 'top' it ate 302MB of memory. (the update was
getting a whole new directory with that number of files)... on a machine
with 256MB of physical memory and 128MB swap, I am simply unable
to update my source tree :-) I had to shot down KDE & X to get somewhere
and even then it was close :-)

Best,
Petr



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: database destroyed

Posted by cm...@collab.net.
Petr Sebor <pe...@scssoft.com> writes:

> >
> >
> >That makes no sense... the db files *disappear*?  Dumping the database
> >is just another process that is reading data via libsvn_fs.  We should
> >have no problem with concurrent db readers.  In fact, we expect
> >multiple httpd child processes to do this all the time.
> >
> Makes no sense to me as well... I just got this error with a notice
> "you have to do DB_RECOVER"... so I did. And even though db_recover
> gives me no error, the __db.* files are simply gone. I don't know how
> this happened, but it is not for the first time.

db_recover without the -e will destroy the __db.* files.  You should
probably just just 'svnadmin recover /path/to/repos', or if your
Subversion rig predates the 'svnadmin recover' command, run
'db_recover -veh /path/to/repos/db'.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: database destroyed

Posted by Philip Martin <ph...@codematters.co.uk>.
Petr Sebor <pe...@scssoft.com> writes:

> According to the replies I got, it turned out to be may fault.
> I have proably used db_recover without the '-e' option, which
> was the cause of missing __db.* files. Wasn't aware of this...

As far as I know, loss of the __db files is not the end of the world,
running "svnadmin recover" will recreate them.

> As for the OOM, this is not reported in the apache log file,
> though I have experienced an OOM with client version 0.14.5.
> [which is probably fixed I am being told]

I seem to recall that nothing appears in the the Apache log when the
Linux OOM killer is responsible, the only message is the one in the
syslog.  I may be mistaken, it's long time since I caused httpd to die
that way.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: database destroyed

Posted by Petr Sebor <pe...@scssoft.com>.
> 
>
>Is it simply that the svnadmin process uses enough memory to push the
>machine over its memory limit, and this causes the httpd process to
>fail?  Is there a message in the Apache log about a process being
>killed?  Or if you are running Linux there may be a message in the
>system log from the kernel OOM killer.
>
>  
>
>>__db.* files are gone.... [db_recover is quiet though]
>>    
>>
>
>However, running out of memory wouldn't explain why the __db files are
>missing.
>

According to the replies I got, it turned out to be may fault.
I have proably used db_recover without the '-e' option, which
was the cause of missing __db.* files. Wasn't aware of this...

As for the OOM, this is not reported in the apache log file,
though I have experienced an OOM with client version 0.14.5.
[which is probably fixed I am being told]

Best,
Petr

Re: database destroyed

Posted by Philip Martin <ph...@codematters.co.uk>.
Petr Sebor <pe...@scssoft.com> writes:

> I will try to verify all of that tommorow. However, the mentioned scenario
> is something like this:

I can carry out the steps below without a problem.

> checkout at one host

$ svn co http://localhost:8888/repo wc

> ..
> .. time passes
> ..
> dumping database begins on localhost

$ svnadmin dump ~/repo > zz

> ..
> .. time passes
> ..
> dump finished

First I get

* Dumped revision 0.
* Dumped revision 1.    # That's 600 files.

Then I get

Checked out revision 1.

> apache2 - internal server error 500
> checkout fails with error

Is it simply that the svnadmin process uses enough memory to push the
machine over its memory limit, and this causes the httpd process to
fail?  Is there a message in the Apache log about a process being
killed?  Or if you are running Linux there may be a message in the
system log from the kernel OOM killer.

> __db.* files are gone.... [db_recover is quiet though]

However, running out of memory wouldn't explain why the __db files are
missing.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: database destroyed

Posted by Petr Sebor <pe...@scssoft.com>.
> 
>
>That makes no sense... the db files *disappear*?  Dumping the database
>is just another process that is reading data via libsvn_fs.  We should
>have no problem with concurrent db readers.  In fact, we expect
>multiple httpd child processes to do this all the time.
>  
>
Makes no sense to me as well... I just got this error with a notice
"you have to do DB_RECOVER"... so I did. And even though db_recover
gives me no error, the __db.* files are simply gone. I don't know how
this happened, but it is not for the first time.

>My only thought is that perhaps there is a clash in permissions.  That
>is, your 'svnadmin dump' process is running as one user, and the 'svn
>co' process is running as different user.  I don't know how this would
>cause db files to vanish, though.
>
>Can you give us a simple reproduction recipe?  Just try dumping during
>a checkout?  Which process should start first?  Which process should
>end first?
>
I will try to verify all of that tommorow. However, the mentioned scenario
is something like this:

checkout at one host
..
.. time passes
..
dumping database begins on localhost
..
.. time passes
..
dump finished
apache2 - internal server error 500
checkout fails with error
__db.* files are gone.... [db_recover is quiet though]

Thats exactly what has happened to me three hours ago. And I remember
the same case happened to me about week ago as well, when doing dump
and update at the same time.

Petr



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: database destroyed

Posted by Blair Zajac <bl...@orcaware.com>.
Ben Collins-Sussman wrote:
> 
> Petr Sebor <pe...@scssoft.com> writes:
> 
> > When (just by accident) someone operates with the database - co/up, while
> > dumping the database, it somehow gets always corrupt (so I have to run
> > db_recover on it) or worse, the db/__db.001 ... 005 simply just disappear.
> > The only cure I know is to svnadmin load the dump back in to regain
> > working repository :-(
> 
> That makes no sense... the db files *disappear*?  Dumping the database
> is just another process that is reading data via libsvn_fs.  We should
> have no problem with concurrent db readers.  In fact, we expect
> multiple httpd child processes to do this all the time.

Are you using hot-backup.py?  It deletes the log files that are no
longer required after they are backed up.

Blair

-- 
Blair Zajac <bl...@orcaware.com>
Plots of your system's performance plots - http://www.orcaware.com/orca/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: database destroyed

Posted by Ben Collins-Sussman <su...@collab.net>.
Petr Sebor <pe...@scssoft.com> writes:

> When (just by accident) someone operates with the database - co/up, while
> dumping the database, it somehow gets always corrupt (so I have to run
> db_recover on it) or worse, the db/__db.001 ... 005 simply just disappear.
> The only cure I know is to svnadmin load the dump back in to regain
> working repository :-(

That makes no sense... the db files *disappear*?  Dumping the database
is just another process that is reading data via libsvn_fs.  We should
have no problem with concurrent db readers.  In fact, we expect
multiple httpd child processes to do this all the time.

> 
> and I am really not sure, why the 'svnadmin dump' needs RW access to
> the database?

This is how Berkeley DB works -- even when reading, a lockfile is
still created in the repository.

My only thought is that perhaps there is a clash in permissions.  That
is, your 'svnadmin dump' process is running as one user, and the 'svn
co' process is running as different user.  I don't know how this would
cause db files to vanish, though.

Can you give us a simple reproduction recipe?  Just try dumping during
a checkout?  Which process should start first?  Which process should
end first?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org