You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Keith Bostic <bo...@abyssinian.sleepycat.com> on 2004/12/08 17:38:38 UTC
Subversion's use of Berkeley DB [#11511]
Hi, my name is Keith Bostic and I'm with Sleepycat Software.
We (Sleepycat Software) are getting beaten up periodically
because Subversion users have problems with Berkeley DB, and I'd
like to see if we can fix that once and for all. To that end,
I've been talking with Mike Pilato over the past few days about
how Subversion uses Berkeley DB, and where problems might be.
There were three issues we found. I'm going to describe them
in this email, and I'm happy to answer any questions anyone has.
Then, Mike and I were hoping to find someone willing to sign up
for making whatever code changes are needed in Subversion.
1. The Subversion code is not setting the Berkeley DB cache size.
Given Berkeley DB's small default cache size (256KB), and the
expected good locality of reference for Subversion queries,
I think Subversion will be able to increase performance by
setting the cache size.
You can set the cache in the DB_CONFIG file, or by using
the DbEnv::set_cachesize method:
http://www.sleepycat.com/docs/api_c/env_set_cachesize.html
For more information, see the "Selecting a cache size"
section of the Berkeley DB Reference Guide, included in your
download package and also available at:
http://www.sleepycat.com/docs/ref/am_conf/cachesize.html
Action Items:
Investigate the efficiency of the current Subversion cache
(using the Berkeley DB db_stat utility), and see if there's
benefit to be had by increasing the cache size.
Change Subversion to specify a cache size whenever creating
a Berkeley DB database environment.
2. Subversion users are occasionally seeing "out of memory
errors". The Subversion code has recently added an error
callback routine, so future occurrences of this problem
should result in the detailed Berkeley DB error message being
available for later debugging.
Given the default 256KB cache size, and using, for example,
16KB database page sizes, 8 threads of control in the
database at the same time, each grabbing 2 pages, will run
the cache out of room, resulting in this failure. So,
increasing the cache size may very well fix this problem.
Action Items:
None at this time.
3. Subversion isn't recovering the database after application or
system failure -- it's only running recovery if Berkeley DB
explicitly returns DB_RUNRECOVERY.
This is likely the source of the periodic corruption Subversion
users have seen.
The problem is Subversion is itself a library, with different
top-layer interfaces, Apache and standalone administrative
programs among them. To solve this problem we're going to
need to find a way for the Subversion library to know if a
thread of control entering Subversion code is the first
thread of control to access the Berkeley DB database
environment so it can run recovery as it opens the database
environment.
This is the problem that George Schlossnagle had to solve for
integrating Berkeley DB with the Apache mod_db4 module, and
it's a standard problem for Sleepycat Software customers
using Berkeley DB in multi-process environments. The fact
that Subversion is a library, and the Subversion installation
cannot modify system startup procedures complicates things
somewhat, though.
There already appears to be some code in Subversion trying
to know when Subversion is creating a database environment,
so it may be simpler than we think.
Action Items:
This item may need more discussion.
As a springboard for that discussion, I propose we find a
serialization point for all threads of control using a
Subversion repository so we can determine if a thread of
control is the first thread of control entering the database
environment after a possible application or system failure.
Regards,
--keith
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic@sleepycat.com
Sleepycat Software Inc. keithbosticim (ymsgid)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Subversion's use of Berkeley DB [#11511]
Posted by Greg Hudson <gh...@MIT.EDU>.
On Wed, 2004-12-08 at 12:38, Keith Bostic wrote:
> 3. Subversion isn't recovering the database after application or
> system failure -- it's only running recovery if Berkeley DB
> explicitly returns DB_RUNRECOVERY.
I'd like to point out that, in principle, anything Subversion can do to
work around this problem (in its current architecture) could be done by
the Berkeley DB library itself. Perhaps more robustly, as Subversion's
BDB tables could conceivably be accessed without going through the
Subversion libraries.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Subversion's use of Berkeley DB [#11511]
Posted by Branko Čibej <br...@xbc.nu>.
Philip Martin wrote:
>Philip Martin <ph...@codematters.co.uk> writes:
>
>
>
>>Should be simple to implement, but I don't know what sort of
>>performance effect it would have. Do we really want to run recovery
>>that often?
>>
>>
>
>A quick 'n' dirty implementation
>
>Index: subversion/libsvn_repos/repos.c
>===================================================================
>--- subversion/libsvn_repos/repos.c (revision 12263)
>+++ subversion/libsvn_repos/repos.c (working copy)
>@@ -1075,6 +1075,9 @@
> const char *lockfile_path;
> svn_error_t *err;
>
>+ if (! exclusive)
>+ SVN_ERR (svn_repos_recover2 (path, TRUE, NULL, NULL, pool));
>+
> /* Get a filehandle for the repository's db lockfile. */
> lockfile_path = svn_repos_db_lockfile (repos, pool);
>
>
>$ time for i in `seq 1 30`;do svn st -u wc > /dev/null;done
>
>Without patch:
>real 0m15.294s
>user 0m2.450s
>sys 0m0.620s
>
>With patch:
>real 0m24.186s
>user 0m3.620s
>sys 0m1.480s
>
>
Yes, we definitely don't want to recover every time...
-- Brane
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Subversion's use of Berkeley DB [#11511]
Posted by Philip Martin <ph...@codematters.co.uk>.
Philip Martin <ph...@codematters.co.uk> writes:
> Should be simple to implement, but I don't know what sort of
> performance effect it would have. Do we really want to run recovery
> that often?
A quick 'n' dirty implementation
Index: subversion/libsvn_repos/repos.c
===================================================================
--- subversion/libsvn_repos/repos.c (revision 12263)
+++ subversion/libsvn_repos/repos.c (working copy)
@@ -1075,6 +1075,9 @@
const char *lockfile_path;
svn_error_t *err;
+ if (! exclusive)
+ SVN_ERR (svn_repos_recover2 (path, TRUE, NULL, NULL, pool));
+
/* Get a filehandle for the repository's db lockfile. */
lockfile_path = svn_repos_db_lockfile (repos, pool);
$ time for i in `seq 1 30`;do svn st -u wc > /dev/null;done
Without patch:
real 0m15.294s
user 0m2.450s
sys 0m0.620s
With patch:
real 0m24.186s
user 0m3.620s
sys 0m1.480s
--
Philip Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Subversion's use of Berkeley DB [#11511]
Posted by Branko Čibej <br...@xbc.nu>.
Branko Čibej wrote:
> Keith, in the past I noticed that when a process with that has opened
> a BDB environment crashes, leving love locks behind,
Me and my sloppy spelling... "when a process that has opened a BDB
environment crashes, leaving live locks behind"
-- Brane
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Subversion's use of Berkeley DB [#11511]
Posted by Branko Čibej <br...@xbc.nu>.
Philip Martin wrote:
>Branko ÄŒibej <br...@xbc.nu> writes:
>
>
>
>>C. Michael Pilato wrote:
>>
>>
>>
>>>Keith Bostic <bo...@abyssinian.sleepycat.com> writes:
>>>
>>>
>>>
>>>> As a springboard for that discussion, I propose we find a
>>>> serialization point for all threads of control using a
>>>> Subversion repository so we can determine if a thread of
>>>> control is the first thread of control entering the database
>>>> environment after a possible application or system failure.
>>>>
>>>>
>>>My suggestion is that libsvn_fs_base grows the same serialization that
>>>mod_db4 uses, which is based around the use of a shared memory segment
>>>with a reference count in it.
>>>
>>>
>>>
>>There is a pretty fundamental problem with using a reference count
>>like that. If a process that is accessing BDB crashes after having
>>incremented the refcount, the refcount is gets out if sync and is
>>useless. Also, other processes that are already running might wedge on
>>a lock owned by the crashed process. I see no way to resolve this.
>>
>>
>
>We have the libsvn_repos lock, which is the current mechanism to
>ensures that recovery gets exclusive access. Normal read/write
>repository access, such as svn_repos_open(), takes a non-exclusive
>lock, while "special" access, such as svn_repos_recover(), takes an
>exclusive lock.
>
>Suppose svn_repos_open() were first to make a non-blocking attempt to
>take an exclusive lock, if that fails it just carries on as at present
>and tries to take a non-exclusive lock. If however svn_repos_open()
>manages to take an exclusive lock then it can do whatever it wants
>(perhaps run recovery?) and then it can drop the exclusive lock, take
>a non-exclusive one and continue as at present.
>
>This would mean that recovery would get run whenever anything accessed
>a repository that was not otherwise being accessed, is that the
>desired behaviour? Should be simple to implement, but I don't know
>what sort of performance effect it would have. Do we really want to
>run recovery that often?
>
>
Probably not; some sort of counter would be nice, with a forced check
every N times (like forced fscks on some filesystems). Of course, we
should also force a check any time some process gets a DB_RUN_RECOVERY
from BDB.
Keith, in the past I noticed that when a process with that has opened a
BDB environment crashes, leving love locks behind, other processes that
are using the same environment may hang indefinitely. This is what
usually causes SVN repositories to get in a "wedged" state. This leads
me to believe that relying on BDB to detect this situation and return
DB_RUN_RECOVERY to the other processes isn't reliable.
Am I missing something here, or is using a separate server process with
exclusive access to the repository truly the only way to completely
avoid such hangs?
-- Brane
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Subversion's use of Berkeley DB [#11511]
Posted by Philip Martin <ph...@codematters.co.uk>.
Branko Äibej <br...@xbc.nu> writes:
> C. Michael Pilato wrote:
>
>>Keith Bostic <bo...@abyssinian.sleepycat.com> writes:
>>
>>> As a springboard for that discussion, I propose we find a
>>> serialization point for all threads of control using a
>>> Subversion repository so we can determine if a thread of
>>> control is the first thread of control entering the database
>>> environment after a possible application or system failure.
>>
>>My suggestion is that libsvn_fs_base grows the same serialization that
>>mod_db4 uses, which is based around the use of a shared memory segment
>>with a reference count in it.
>>
> There is a pretty fundamental problem with using a reference count
> like that. If a process that is accessing BDB crashes after having
> incremented the refcount, the refcount is gets out if sync and is
> useless. Also, other processes that are already running might wedge on
> a lock owned by the crashed process. I see no way to resolve this.
We have the libsvn_repos lock, which is the current mechanism to
ensures that recovery gets exclusive access. Normal read/write
repository access, such as svn_repos_open(), takes a non-exclusive
lock, while "special" access, such as svn_repos_recover(), takes an
exclusive lock.
Suppose svn_repos_open() were first to make a non-blocking attempt to
take an exclusive lock, if that fails it just carries on as at present
and tries to take a non-exclusive lock. If however svn_repos_open()
manages to take an exclusive lock then it can do whatever it wants
(perhaps run recovery?) and then it can drop the exclusive lock, take
a non-exclusive one and continue as at present.
This would mean that recovery would get run whenever anything accessed
a repository that was not otherwise being accessed, is that the
desired behaviour? Should be simple to implement, but I don't know
what sort of performance effect it would have. Do we really want to
run recovery that often?
--
Philip Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Subversion's use of Berkeley DB [#11511]
Posted by Branko Čibej <br...@xbc.nu>.
C. Michael Pilato wrote:
>Keith Bostic <bo...@abyssinian.sleepycat.com> writes:
>
>
>
>> As a springboard for that discussion, I propose we find a
>> serialization point for all threads of control using a
>> Subversion repository so we can determine if a thread of
>> control is the first thread of control entering the database
>> environment after a possible application or system failure.
>>
>>
>
>My suggestion is that libsvn_fs_base grows the same serialization that
>mod_db4 uses, which is based around the use of a shared memory segment
>with a reference count in it.
>
There is a pretty fundamental problem with using a reference count like
that. If a process that is accessing BDB crashes after having
incremented the refcount, the refcount is gets out if sync and is
useless. Also, other processes that are already running might wedge on a
lock owned by the crashed process. I see no way to resolve this.
-- Brane
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Subversion's use of Berkeley DB [#11511]
Posted by "C. Michael Pilato" <cm...@collab.net>.
Justin Erenkrantz <ju...@erenkrantz.com> writes:
> --On Wednesday, December 8, 2004 12:15 PM -0600 "C. Michael Pilato"
> <cm...@collab.net> wrote:
>
> > My suggestion is that libsvn_fs_base grows the same serialization that
> > mod_db4 uses, which is based around the use of a shared memory segment
> > with a reference count in it. Can we make use of apr_atomics for
> > something like this?
>
> So, would we add a named shared memory segment that resides underneath
> the repository - i.e. the locks subdirectory? If so, then, yes apr's
> shmem routines would be able to map it in.
Nono. This must be done at the libsvn_fs_base level, preferably
stored in the db/ directory.
> However, APR's atomics can't provide any guarantees outside of a
> single process. If the hardware/OS supports atomics, then it does,
> but the APR fallback atomic code relies upon thread mutexes. So, we'd
> have to use some type of file lock as well - perhaps the db.lock we
> already use for recover as well? -- justin
Okay, so no go on the atomics. Locking is fine, but again, must be
done in libsvn_fs_base (not libsvn_repos). In other words, we do this
at the lowest reasonable level at which a program might hook into our
Berkeley DB environment. Obviously, we can't do anything about
programs that try to read our tables directly (bypassing our APIs)
... but shame on those program anyway.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Subversion's use of Berkeley DB [#11511]
Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Wednesday, December 8, 2004 12:15 PM -0600 "C. Michael Pilato"
<cm...@collab.net> wrote:
> My suggestion is that libsvn_fs_base grows the same serialization that
> mod_db4 uses, which is based around the use of a shared memory segment
> with a reference count in it. Can we make use of apr_atomics for
> something like this?
So, would we add a named shared memory segment that resides underneath the
repository - i.e. the locks subdirectory? If so, then, yes apr's shmem
routines would be able to map it in.
However, APR's atomics can't provide any guarantees outside of a single
process. If the hardware/OS supports atomics, then it does, but the APR
fallback atomic code relies upon thread mutexes. So, we'd have to use some
type of file lock as well - perhaps the db.lock we already use for recover as
well? -- justin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Subversion's use of Berkeley DB [#11511]
Posted by "C. Michael Pilato" <cm...@collab.net>.
Keith Bostic <bo...@abyssinian.sleepycat.com> writes:
> As a springboard for that discussion, I propose we find a
> serialization point for all threads of control using a
> Subversion repository so we can determine if a thread of
> control is the first thread of control entering the database
> environment after a possible application or system failure.
My suggestion is that libsvn_fs_base grows the same serialization that
mod_db4 uses, which is based around the use of a shared memory segment
with a reference count in it. Can we make use of apr_atomics for
something like this?
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Subversion's use of Berkeley DB [#11511]
Posted by Garrett Rooney <ro...@electricjellyfish.net>.
Keith Bostic wrote:
> 2. Subversion users are occasionally seeing "out of memory
> errors". The Subversion code has recently added an error
> callback routine, so future occurrences of this problem
> should result in the detailed Berkeley DB error message being
> available for later debugging.
>
> Given the default 256KB cache size, and using, for example,
> 16KB database page sizes, 8 threads of control in the
> database at the same time, each grabbing 2 pages, will run
> the cache out of room, resulting in this failure. So,
> increasing the cache size may very well fix this problem.
>
> Action Items:
> None at this time.
So you're saying that the amount of cache used is proportional to the
number of concurrent threads accessing the db, and if you run out of
cache things just don't work? That seems less than optimal... Is there
anyway to make it allocate more room to the cache dynamically?
-garrett
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Subversion's use of Berkeley DB [#11511]
Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Wednesday, December 8, 2004 12:38 PM -0500 Keith Bostic
<bo...@abyssinian.sleepycat.com> wrote:
> 1. The Subversion code is not setting the Berkeley DB cache size.
> Given Berkeley DB's small default cache size (256KB), and the
> expected good locality of reference for Subversion queries,
> I think Subversion will be able to increase performance by
> setting the cache size.
>
> You can set the cache in the DB_CONFIG file, or by using
> the DbEnv::set_cachesize method:
>
> http://www.sleepycat.com/docs/api_c/env_set_cachesize.html
>
> For more information, see the "Selecting a cache size"
> section of the Berkeley DB Reference Guide, included in your
> download package and also available at:
>
> http://www.sleepycat.com/docs/ref/am_conf/cachesize.html
>
> Action Items:
> Investigate the efficiency of the current Subversion cache
> (using the Berkeley DB db_stat utility), and see if there's
> benefit to be had by increasing the cache size.
>
> Change Subversion to specify a cache size whenever creating
> a Berkeley DB database environment.
The question I have is what's an appropriate cache size? 1M? 2M? 8M? 128M?
Can we change the cache size by just tweaking DB_CONFIG and restarting the
processes? Or, do we need to rebuild the database? (The docs I can find
aren't very helpful on this.)
If it helps, here's the db_stat -m output from our (svn.apache.org) install
using BDB 4.2:
<http://www.apache.org/~jerenkrantz/bdb-db-stat.txt>
This is a fairly loaded public SVN install... -- justin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Subversion's use of Berkeley DB [#11511]
Posted by "C. Michael Pilato" <cm...@collab.net>.
Justin Erenkrantz <ju...@erenkrantz.com> writes:
> How did George resolve this for mod_db4? (This doesn't directly help
> ra_svn or SSH tunneling though, but can provide us with some
> insights.)
George said (in a private mail to me last week):
I use a shared-memory hash (the mm_hash.[ch] implementation which sits
on top of libmm, but which could sit on something else) that tracks
the reference count on the file. My wrapper around the open()
function in the DB_ENV struct then looks like this:
static int new_db_env_open(DB_ENV *dbenv, const char *db_home,
u_int32_t flags, int mode)
{
int ret =666;
DB_ENV *cached_dbenv;
flags |= DB_INIT_MPOOL;
/* if global ref count is 0, open for recovery */
if(global_ref_count_get(db_home) == 0) {
flags |= DB_RECOVER;
flags |= DB_INIT_TXN;
flags |= DB_CREATE;
}
if((cached_dbenv = retrieve_db_env(db_home)) != NULL) {
memcpy(dbenv, cached_dbenv, sizeof(DB_ENV));
ret = 0;
}
else if((ret = old_db_env_open(dbenv, db_home, flags, mode)) == 0) {
register_db_env(dbenv);
}
return ret;
}
If you have a single DBM file for a given subversion instance (I don't
know how svn exactly works internally), you can also just use a sysv
semaphore. The reason I didn't use that in mod_db4 is that it needed
to be able to support simultaneously managing an arbitrary number of
DB_ENVs.
I hope that helps, and I'm happy to participate further in the
discussion if that didn't fully answer your question.
George
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Subversion's use of Berkeley DB [#11511]
Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Wednesday, December 8, 2004 12:38 PM -0500 Keith Bostic
<bo...@abyssinian.sleepycat.com> wrote:
> As a springboard for that discussion, I propose we find a
> serialization point for all threads of control using a
> Subversion repository so we can determine if a thread of
> control is the first thread of control entering the database
> environment after a possible application or system failure.
Well, for mod_dav_svn access (WebDAV), we can have an Apache hook run on
initialization before httpd starts serving pages. So, the only thing we'd
need to do is figure out if some other process (or system) crash occurred that
left it in a potentially goofy (but non-detectable??) state.
Don't I recall you (or someone else) advising that we always run recovery on
process initialization? Would that work here?
How did George resolve this for mod_db4? (This doesn't directly help ra_svn
or SSH tunneling though, but can provide us with some insights.)
Thanks! -- justin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org