You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Marius Gedminas <mg...@b4net.lt> on 2005/10/22 12:12:51 UTC

OOM problems

I keep bits of my home directory in a Subversion repository.  It has 600
revisions and occupies 250 megs of disk space.  It uses bsddb.  I kept
it on a server running Subversion 1.1.4 from Debian sarge, and committed
stuff to it from my laptop running Subversion 1.2.0 from Ubuntu breezy
over svn+ssh.

Usage pattern was normal -- a number of small checkins.  I have added a
bunch of files totaling 90 megs in rev 571, and another bunch totaling
76 megs in rev 597.

Two days ago I tried to commit a small patch, but the commit timed out.
I retried a few times, then gave up assuming network problems.

Yesterday the same thing happened, and I sshed into the server to take
a look.  I saw a bunch of svnserve processes.  I killed them, ran
svnadmin recover just in case.  Then I decided to do 'svn up' in the
root of the checkout on my laptop.  Nothing happened for a while, but
after a few minutes svnserve on the server side ate all available RAM
and swap, and caused the kernel to invoke the OOM killer.  The server
has 512 megs of RAM and 2 gigs of swap!

I thought those two big commits were causing a problem.  I decided to
use svnadmin dump and svndumpfilter.  svnadmin dump dumped 5 revisions
successfully and started eating all the memory while processing rev 6.
I killed it.  svnadmin verify acts likewise.

I scp'ed the repository to my laptop, ran svnadmin recover for good
measure, and retried svnadmin dump with Subversion 1.2.0.  The same
thing happened.  svnadmin eats about 300 megs of RAM in 20 seconds, then
I kill it.

I have tried dumping random revisions with svnadmin dump -r N
--incremental, and looking at their size with wc -l.  There are 28
revisions out of 600 that I cannot dump without running out of RAM:
6, 7, 17, 25, 32, 39, ..., 229, 276, 458, 595.  The two large commits
that I suspected (571 and 597) are not among them.

I can access all log messages with svnlook log -r N with no problems.

What do I do now?

Marius Gedminas
-- 
Any time somebody tells you that you shouldn't do something because it's
"unprofessional," you know that they've run out of real arguments.
		-- Joel Spolski

Re: OOM problems

Posted by Joshua Varner <jl...@gmail.com>.
On 11/1/05, Marius Gedminas <mg...@b4net.lt> wrote:
> On Sat, Oct 22, 2005 at 03:12:51PM +0300, Marius Gedminas wrote:
> > I keep bits of my home directory in a Subversion repository.  It has 600
> > revisions and occupies 250 megs of disk space.  It uses bsddb.  I kept
> > it on a server running Subversion 1.1.4 from Debian sarge, and committed
> > stuff to it from my laptop running Subversion 1.2.0 from Ubuntu breezy
> > over svn+ssh.
> ...
> > Then I decided to do 'svn up' in the
> > root of the checkout on my laptop.  Nothing happened for a while, but
> > after a few minutes svnserve on the server side ate all available RAM
> > and swap, and caused the kernel to invoke the OOM killer.  The server
> > has 512 megs of RAM and 2 gigs of swap!
> >
> > I thought those two big commits were causing a problem.  I decided to
> > use svnadmin dump and svndumpfilter.  svnadmin dump dumped 5 revisions
> > successfully and started eating all the memory while processing rev 6.
> > I killed it.  svnadmin verify acts likewise.
> > I scp'ed the repository to my laptop, ran svnadmin recover for good
> > measure, and retried svnadmin dump with Subversion 1.2.0.  The same
> > thing happened.  svnadmin eats about 300 megs of RAM in 20 seconds, then
> > I kill it.
>
> This is now http://subversion.tigris.org/issues/show_bug.cgi?id=2430
>
Unless you are willing to provide the repository to someone with serious
bdb knowledge, nothing can be done, You'll have to post it online or send
it to someone via private e-mail. The bug you filed simply does not have the
information necessary to do anything with.

You might try dumping it with 1.3 when it is released.

Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org


Re: OOM problems

Posted by Marius Gedminas <mg...@b4net.lt>.
On Sat, Oct 22, 2005 at 03:12:51PM +0300, Marius Gedminas wrote:
> I keep bits of my home directory in a Subversion repository.  It has 600
> revisions and occupies 250 megs of disk space.  It uses bsddb.  I kept
> it on a server running Subversion 1.1.4 from Debian sarge, and committed
> stuff to it from my laptop running Subversion 1.2.0 from Ubuntu breezy
> over svn+ssh.
...
> Then I decided to do 'svn up' in the
> root of the checkout on my laptop.  Nothing happened for a while, but
> after a few minutes svnserve on the server side ate all available RAM
> and swap, and caused the kernel to invoke the OOM killer.  The server
> has 512 megs of RAM and 2 gigs of swap!
> 
> I thought those two big commits were causing a problem.  I decided to
> use svnadmin dump and svndumpfilter.  svnadmin dump dumped 5 revisions
> successfully and started eating all the memory while processing rev 6.
> I killed it.  svnadmin verify acts likewise.
> I scp'ed the repository to my laptop, ran svnadmin recover for good
> measure, and retried svnadmin dump with Subversion 1.2.0.  The same
> thing happened.  svnadmin eats about 300 megs of RAM in 20 seconds, then
> I kill it.

This is now http://subversion.tigris.org/issues/show_bug.cgi?id=2430

Marius Gedminas
-- 
(define the-question (or (* 2 b) (not (* 2 b))))

Re: OOM problems

Posted by Marius Gedminas <mg...@b4net.lt>.
Background: somehow a BDB repository was corrupted and started looping
forever eating more and more memory, until the kernel's OOM killer
intervened.  http://subversion.tigris.org/issues/show_bug.cgi?id=2430

On Wed, Nov 02, 2005 at 06:24:04PM +0200, Marius Gedminas wrote:
> Ok, here's what happens: the do..while loop in rep_read_range never
> finishes.  rep_key inside it alternates between "2y1" and "4hq".
> 
> Looks like a loop in a data structure that should not contain loops.
> Fun fun fun.
> 
> According to log messages, revs 458 and 595 only changed svn:ignore
> properties.  I think (although I cannot prove) that the problem is with
> svn:ignore on a single directory.

I patched dump_node (libsvn_repos/dump.c) and delta_proplists
(libsvn_repos/delta.c) to skip property lists when path was "/", and
that way got a full repository dump (without any properties on the
topmost directory).  That makes me happy (not as happy as I would be if
subversion had never broken down in the first place, but much happier
than if I had lost all the history).

Marius Gedminas
-- 
Microsoft has performed an illegal operation and will be shut down.
		-- Judge Jackson

Re: OOM problems

Posted by Marius Gedminas <mg...@b4net.lt>.
The second half of this email contains more interesting information.

On Wed, Nov 02, 2005 at 05:25:10AM -0500, John Szakmeister wrote:
> On Saturday 22 October 2005 08:12, Marius Gedminas wrote:
> > I scp'ed the repository to my laptop, ran svnadmin recover for good
> > measure, and retried svnadmin dump with Subversion 1.2.0.  The same
> > thing happened.  svnadmin eats about 300 megs of RAM in 20 seconds, then
> > I kill it.
> 
> Be careful when doing such things.  BDB is sensitive to platform, OS, and 
> version of the library.  If any of those things changed (and it appears that 
> at least the version of BDB might have changed), then you might have run into 
> a side effect that prevented you from dumping the repository.

I'll keep that in mind.  Versions of libdb4.2 are pretty close
(4.2.52-18 from Debian on the original server; 4.2.52-19ubuntu4 on my
laptop), but have been compiled with different versions of gcc.

(I'll reiterate that svnadmin dump fails on the server in the same way,
so a different libdb4.2 cannot be the only reason).

> > I have tried dumping random revisions with svnadmin dump -r N
> > --incremental, and looking at their size with wc -l.  There are 28
> > revisions out of 600 that I cannot dump without running out of RAM:
> > 6, 7, 17, 25, 32, 39, ..., 229, 276, 458, 595.  The two large commits
> > that I suspected (571 and 597) are not among them.
> >
> > I can access all log messages with svnlook log -r N with no problems.
> 
> That only touches part of the database.  Try 'svn diff -r5:6 url:://to/repo'.  
> That will pull out the entire changeset for that revision.

svn diff -r5:6 fails in the same way (out of memory).

> > What do I do now?
> 
> You have a couple of choices.  If you can tar the repo up someplace, and email 
> me the link, I can take a closer look at the problem.  In the event you can't 
> do that (because of intellectual property concerns), then there is something 
> you can try (and it might be good to do so first).

There are no IP concerns (this repository servers as a backup of my home
directory), but there are some privacy concerns (it contains things like
instant messenging chat logs).  Although I've tried to keep various
passwords and SSH/GPG keys out of it, I'm not entirely sure none have
crept in.

I will think about it.  I would prefer acquiring sufficient knowledge of
subversion internals/bsddb to be able to debug the problem myself,
perhaps with some guidance.  Do you think that is unrealistic?

> Make a copy of the repository, and attempt then a catastrophic recovery: 
> db_recover -c -v -h /path/to/repos/db.  I believe there was one occassion 
> where I saw a similar behavior, and a catastrophic recovery fixed the 
> situation.  To be safe, I'd dump and load the repository if the catastrophic 
> recovery was successful.

Thank you for the suggestion.  Alas, it did not help.

By the way, db4.2_verify reports no errors on any of the database
files.  svnadmin verify runs out of memory.  I will compile subversion
with debug symbols and try to poke around.

(time passes)

Ok, here's what happens: the do..while loop in rep_read_range never
finishes.  rep_key inside it alternates between "2y1" and "4hq".

The rep with key "4hq" is of rep_kind_delta kind, with txn_id =
0x80a0198 "", and contents.delta contain exactly one chunk {version = 0
'\0', offset = 0, string_key = 0x80a01e0 "87b", size = 1485, rep_key =
0x80a01f0 "2y1"}.

The rep with key "2y1" is of rep_kind_delta kind, with txn_id =
0x80a03b0 "", and contents.delta contain exactly one chunk {version = 0
'\0', offset = 0, string_key = 0x80a03f8 "84w", size = 1384, rep_key =
0x80a0408 "4hq"}.

Looks like a loop in a data structure that should not contain loops.
Fun fun fun.

I added a printf to that loop, and hacked up a second array of rep_keys, with
an inner for loop to look for duplicates (since I'm not familiar with apr's
hash tables).  Here's the chain of looping rep_keys when I run svnadmin 
dump --incremental -r 6 on my repository:

  rep_read_range(rep_key="i")
   `> loading rep "i"
   `> loading rep "1r"
   `> loading rep "7f"
   `> loading rep "120"
   `> loading rep "1fr"
   `> loading rep "1sw"
   `> loading rep "2y1"
   `> loading rep "4hq"
  svnadmin: Looping rep_key '2y1'

It appears that all other broken revisions (in my original email I
listed 6, 7, 17, 25, 32, 39, (skipped a bunch of them in the middle), 229,
276, 458, 595, and I just checked all of these) end up with this cycle.
"2y1" is passed directly as the rep_key argument to rep_read_range when
I try to dump rev 458, and "4hq" is likewise passed when I try to read
rev 595.

According to log messages, revs 458 and 595 only changed svn:ignore
properties.  I think (although I cannot prove) that the problem is with
svn:ignore on a single directory.


Dear Subversion developers, would you mind adding such a loop check to
rep_read_range?  I can send my uber-hacky diff to pinpoint the place
in the code, if necessary.

Cheers,
Marius Gedminas
-- 
Voodoo Programming:  Things programmers do that they know shouldn't work but
they try anyway, and which sometimes actually work, such as recompiling
everything.
-- Karl Lehenbauer

Re: OOM problems

Posted by John Szakmeister <jo...@szakmeister.net>.
On Saturday 22 October 2005 08:12, Marius Gedminas wrote:
> I keep bits of my home directory in a Subversion repository.  It has 600
> revisions and occupies 250 megs of disk space.  It uses bsddb.  I kept
> it on a server running Subversion 1.1.4 from Debian sarge, and committed
> stuff to it from my laptop running Subversion 1.2.0 from Ubuntu breezy
> over svn+ssh.
>
> Usage pattern was normal -- a number of small checkins.  I have added a
> bunch of files totaling 90 megs in rev 571, and another bunch totaling
> 76 megs in rev 597.
>
> Two days ago I tried to commit a small patch, but the commit timed out.
> I retried a few times, then gave up assuming network problems.
>
> Yesterday the same thing happened, and I sshed into the server to take
> a look.  I saw a bunch of svnserve processes.  I killed them, ran
> svnadmin recover just in case.  Then I decided to do 'svn up' in the
> root of the checkout on my laptop.  Nothing happened for a while, but
> after a few minutes svnserve on the server side ate all available RAM
> and swap, and caused the kernel to invoke the OOM killer.  The server
> has 512 megs of RAM and 2 gigs of swap!
>
> I thought those two big commits were causing a problem.  I decided to
> use svnadmin dump and svndumpfilter.  svnadmin dump dumped 5 revisions
> successfully and started eating all the memory while processing rev 6.
> I killed it.  svnadmin verify acts likewise.
>
> I scp'ed the repository to my laptop, ran svnadmin recover for good
> measure, and retried svnadmin dump with Subversion 1.2.0.  The same
> thing happened.  svnadmin eats about 300 megs of RAM in 20 seconds, then
> I kill it.

Be careful when doing such things.  BDB is sensitive to platform, OS, and 
version of the library.  If any of those things changed (and it appears that 
at least the version of BDB might have changed), then you might have run into 
a side effect that prevented you from dumping the repository.

> I have tried dumping random revisions with svnadmin dump -r N
> --incremental, and looking at their size with wc -l.  There are 28
> revisions out of 600 that I cannot dump without running out of RAM:
> 6, 7, 17, 25, 32, 39, ..., 229, 276, 458, 595.  The two large commits
> that I suspected (571 and 597) are not among them.
>
> I can access all log messages with svnlook log -r N with no problems.

That only touches part of the database.  Try 'svn diff -r5:6 url:://to/repo'.  
That will pull out the entire changeset for that revision.

> What do I do now?

You have a couple of choices.  If you can tar the repo up someplace, and email 
me the link, I can take a closer look at the problem.  In the event you can't 
do that (because of intellectual property concerns), then there is something 
you can try (and it might be good to do so first).

Make a copy of the repository, and attempt then a catastrophic recovery: 
db_recover -c -v -h /path/to/repos/db.  I believe there was one occassion 
where I saw a similar behavior, and a catastrophic recovery fixed the 
situation.  To be safe, I'd dump and load the repository if the catastrophic 
recovery was successful.

HTH.

-John

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org