You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Keith Bostic <bo...@abyssinian.sleepycat.com> on 2004/12/08 17:38:38 UTC

Subversion's use of Berkeley DB [#11511]

Hi, my name is Keith Bostic and I'm with Sleepycat Software.

We (Sleepycat Software) are getting beaten up periodically
because Subversion users have problems with Berkeley DB, and I'd
like to see if we can fix that once and for all.  To that end,
I've been talking with Mike Pilato over the past few days about
how Subversion uses Berkeley DB, and where problems might be.

There were three issues we found.  I'm going to describe them
in this email, and I'm happy to answer any questions anyone has.
Then, Mike and I were hoping to find someone willing to sign up
for making whatever code changes are needed in Subversion.

1. The Subversion code is not setting the Berkeley DB cache size.
   Given Berkeley DB's small default cache size (256KB), and the
   expected good locality of reference for Subversion queries,
   I think Subversion will be able to increase performance by
   setting the cache size.

   You can set the cache in the DB_CONFIG file, or by using
   the DbEnv::set_cachesize method:

   http://www.sleepycat.com/docs/api_c/env_set_cachesize.html

   For more information, see the "Selecting a cache size"
   section of the Berkeley DB Reference Guide, included in your
   download package and also available at:

   http://www.sleepycat.com/docs/ref/am_conf/cachesize.html

   Action Items:
   Investigate the efficiency of the current Subversion cache
   (using the Berkeley DB db_stat utility), and see if there's
   benefit to be had by increasing the cache size.

   Change Subversion to specify a cache size whenever creating
   a Berkeley DB database environment.

2. Subversion users are occasionally seeing "out of memory
   errors".  The Subversion code has recently added an error
   callback routine, so future occurrences of this problem
   should result in the detailed Berkeley DB error message being
   available for later debugging.

   Given the default 256KB cache size, and using, for example,
   16KB database page sizes, 8 threads of control in the
   database at the same time, each grabbing 2 pages, will run
   the cache out of room, resulting in this failure.  So,
   increasing the cache size may very well fix this problem.

   Action Items:
   None at this time.

3. Subversion isn't recovering the database after application or
   system failure -- it's only running recovery if Berkeley DB
   explicitly returns DB_RUNRECOVERY.

   This is likely the source of the periodic corruption Subversion
   users have seen.

   The problem is Subversion is itself a library, with different
   top-layer interfaces, Apache and standalone administrative
   programs among them.  To solve this problem we're going to
   need to find a way for the Subversion library to know if a
   thread of control entering Subversion code is the first
   thread of control to access the Berkeley DB database
   environment so it can run recovery as it opens the database
   environment.

   This is the problem that George Schlossnagle had to solve for
   integrating Berkeley DB with the Apache mod_db4 module, and
   it's a standard problem for Sleepycat Software customers
   using Berkeley DB in multi-process environments.  The fact
   that Subversion is a library, and the Subversion installation
   cannot modify system startup procedures complicates things
   somewhat, though.

   There already appears to be some code in Subversion trying
   to know when Subversion is creating a database environment,
   so it may be simpler than we think.

   Action Items:
   This item may need more discussion.

   As a springboard for that discussion, I propose we find a
   serialization point for all threads of control using a
   Subversion repository so we can determine if a thread of
   control is the first thread of control entering the database
   environment after a possible application or system failure.

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic			bostic@sleepycat.com
Sleepycat Software Inc.		keithbosticim (ymsgid)
118 Tower Rd.			+1-781-259-3139
Lincoln, MA 01773		http://www.sleepycat.com


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Subversion's use of Berkeley DB [#11511]

Posted by Greg Hudson <gh...@MIT.EDU>.
On Wed, 2004-12-08 at 12:38, Keith Bostic wrote:
> 3. Subversion isn't recovering the database after application or
>    system failure -- it's only running recovery if Berkeley DB
>    explicitly returns DB_RUNRECOVERY.

I'd like to point out that, in principle, anything Subversion can do to
work around this problem (in its current architecture) could be done by
the Berkeley DB library itself.  Perhaps more robustly, as Subversion's
BDB tables could conceivably be accessed without going through the
Subversion libraries.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Subversion's use of Berkeley DB [#11511]

Posted by Branko Čibej <br...@xbc.nu>.
Philip Martin wrote:

>Philip Martin <ph...@codematters.co.uk> writes:
>
>  
>
>>Should be simple to implement, but I don't know what sort of
>>performance effect it would have.  Do we really want to run recovery
>>that often?
>>    
>>
>
>A quick 'n' dirty implementation
>
>Index: subversion/libsvn_repos/repos.c
>===================================================================
>--- subversion/libsvn_repos/repos.c	(revision 12263)
>+++ subversion/libsvn_repos/repos.c	(working copy)
>@@ -1075,6 +1075,9 @@
>     const char *lockfile_path;
>     svn_error_t *err;
> 
>+    if (! exclusive)
>+      SVN_ERR (svn_repos_recover2 (path, TRUE, NULL, NULL, pool));
>+
>     /* Get a filehandle for the repository's db lockfile. */
>     lockfile_path = svn_repos_db_lockfile (repos, pool);
>
>
>$ time for i in `seq 1 30`;do svn st -u wc > /dev/null;done
>
>Without patch:
>real    0m15.294s
>user    0m2.450s
>sys     0m0.620s
>
>With patch:
>real    0m24.186s
>user    0m3.620s
>sys     0m1.480s
>  
>
Yes, we definitely don't want to recover every time...

-- Brane



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Subversion's use of Berkeley DB [#11511]

Posted by Philip Martin <ph...@codematters.co.uk>.
Philip Martin <ph...@codematters.co.uk> writes:

> Should be simple to implement, but I don't know what sort of
> performance effect it would have.  Do we really want to run recovery
> that often?

A quick 'n' dirty implementation

Index: subversion/libsvn_repos/repos.c
===================================================================
--- subversion/libsvn_repos/repos.c	(revision 12263)
+++ subversion/libsvn_repos/repos.c	(working copy)
@@ -1075,6 +1075,9 @@
     const char *lockfile_path;
     svn_error_t *err;
 
+    if (! exclusive)
+      SVN_ERR (svn_repos_recover2 (path, TRUE, NULL, NULL, pool));
+
     /* Get a filehandle for the repository's db lockfile. */
     lockfile_path = svn_repos_db_lockfile (repos, pool);


$ time for i in `seq 1 30`;do svn st -u wc > /dev/null;done

Without patch:
real    0m15.294s
user    0m2.450s
sys     0m0.620s

With patch:
real    0m24.186s
user    0m3.620s
sys     0m1.480s

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Subversion's use of Berkeley DB [#11511]

Posted by Branko Čibej <br...@xbc.nu>.
Branko Čibej wrote:

> Keith, in the past I noticed that when a process with that has opened 
> a BDB environment crashes, leving love locks behind, 

Me and my sloppy spelling... "when a process that has opened a BDB 
environment crashes, leaving live locks behind"

-- Brane



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Subversion's use of Berkeley DB [#11511]

Posted by Branko Čibej <br...@xbc.nu>.
Philip Martin wrote:

>Branko ÄŒibej <br...@xbc.nu> writes:
>
>  
>
>>C. Michael Pilato wrote:
>>
>>    
>>
>>>Keith Bostic <bo...@abyssinian.sleepycat.com> writes:
>>>
>>>      
>>>
>>>>  As a springboard for that discussion, I propose we find a
>>>>  serialization point for all threads of control using a
>>>>  Subversion repository so we can determine if a thread of
>>>>  control is the first thread of control entering the database
>>>>  environment after a possible application or system failure.
>>>>        
>>>>
>>>My suggestion is that libsvn_fs_base grows the same serialization that
>>>mod_db4 uses, which is based around the use of a shared memory segment
>>>with a reference count in it.
>>>
>>>      
>>>
>>There is a pretty fundamental problem with using a reference count
>>like that. If a process that is accessing BDB crashes after having
>>incremented the refcount, the refcount is gets out if sync and is
>>useless. Also, other processes that are already running might wedge on
>>a lock owned by the crashed process. I see no way to resolve this.
>>    
>>
>
>We have the libsvn_repos lock, which is the current mechanism to
>ensures that recovery gets exclusive access.  Normal read/write
>repository access, such as svn_repos_open(), takes a non-exclusive
>lock, while "special" access, such as svn_repos_recover(), takes an
>exclusive lock.
>
>Suppose svn_repos_open() were first to make a non-blocking attempt to
>take an exclusive lock, if that fails it just carries on as at present
>and tries to take a non-exclusive lock.  If however svn_repos_open()
>manages to take an exclusive lock then it can do whatever it wants
>(perhaps run recovery?) and then it can drop the exclusive lock, take
>a non-exclusive one and continue as at present.
>
>This would mean that recovery would get run whenever anything accessed
>a repository that was not otherwise being accessed, is that the
>desired behaviour?  Should be simple to implement, but I don't know
>what sort of performance effect it would have.  Do we really want to
>run recovery that often?
>  
>
Probably not; some sort of counter would be nice, with a forced check 
every N times (like forced fscks on some filesystems). Of course, we 
should also force a check any time some process gets a DB_RUN_RECOVERY 
from BDB.

Keith, in the past I noticed that when a process with that has opened a 
BDB environment crashes, leving love locks behind, other processes that 
are using the same environment may hang indefinitely. This is what 
usually causes SVN repositories to get in a "wedged" state. This leads 
me to believe that relying on BDB to detect this situation and return 
DB_RUN_RECOVERY to the other processes isn't reliable.

Am I missing something here, or is using a separate server process with 
exclusive access to the repository truly the only way to completely 
avoid such hangs?


-- Brane



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Subversion's use of Berkeley DB [#11511]

Posted by Philip Martin <ph...@codematters.co.uk>.
Branko Čibej <br...@xbc.nu> writes:

> C. Michael Pilato wrote:
>
>>Keith Bostic <bo...@abyssinian.sleepycat.com> writes:
>>
>>>   As a springboard for that discussion, I propose we find a
>>>   serialization point for all threads of control using a
>>>   Subversion repository so we can determine if a thread of
>>>   control is the first thread of control entering the database
>>>   environment after a possible application or system failure.
>>
>>My suggestion is that libsvn_fs_base grows the same serialization that
>>mod_db4 uses, which is based around the use of a shared memory segment
>>with a reference count in it.
>>
> There is a pretty fundamental problem with using a reference count
> like that. If a process that is accessing BDB crashes after having
> incremented the refcount, the refcount is gets out if sync and is
> useless. Also, other processes that are already running might wedge on
> a lock owned by the crashed process. I see no way to resolve this.

We have the libsvn_repos lock, which is the current mechanism to
ensures that recovery gets exclusive access.  Normal read/write
repository access, such as svn_repos_open(), takes a non-exclusive
lock, while "special" access, such as svn_repos_recover(), takes an
exclusive lock.

Suppose svn_repos_open() were first to make a non-blocking attempt to
take an exclusive lock, if that fails it just carries on as at present
and tries to take a non-exclusive lock.  If however svn_repos_open()
manages to take an exclusive lock then it can do whatever it wants
(perhaps run recovery?) and then it can drop the exclusive lock, take
a non-exclusive one and continue as at present.

This would mean that recovery would get run whenever anything accessed
a repository that was not otherwise being accessed, is that the
desired behaviour?  Should be simple to implement, but I don't know
what sort of performance effect it would have.  Do we really want to
run recovery that often?

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Subversion's use of Berkeley DB [#11511]

Posted by Branko Čibej <br...@xbc.nu>.
C. Michael Pilato wrote:

>Keith Bostic <bo...@abyssinian.sleepycat.com> writes:
>
>  
>
>>   As a springboard for that discussion, I propose we find a
>>   serialization point for all threads of control using a
>>   Subversion repository so we can determine if a thread of
>>   control is the first thread of control entering the database
>>   environment after a possible application or system failure.
>>    
>>
>
>My suggestion is that libsvn_fs_base grows the same serialization that
>mod_db4 uses, which is based around the use of a shared memory segment
>with a reference count in it.
>
There is a pretty fundamental problem with using a reference count like 
that. If a process that is accessing BDB crashes after having 
incremented the refcount, the refcount is gets out if sync and is 
useless. Also, other processes that are already running might wedge on a 
lock owned by the crashed process. I see no way to resolve this.


-- Brane



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Subversion's use of Berkeley DB [#11511]

Posted by "C. Michael Pilato" <cm...@collab.net>.
Justin Erenkrantz <ju...@erenkrantz.com> writes:

> --On Wednesday, December 8, 2004 12:15 PM -0600 "C. Michael Pilato"
> <cm...@collab.net> wrote:
> 
> > My suggestion is that libsvn_fs_base grows the same serialization that
> > mod_db4 uses, which is based around the use of a shared memory segment
> > with a reference count in it.  Can we make use of apr_atomics for
> > something like this?
> 
> So, would we add a named shared memory segment that resides underneath
> the repository - i.e. the locks subdirectory?  If so, then, yes apr's
> shmem routines would be able to map it in.

Nono.  This must be done at the libsvn_fs_base level, preferably
stored in the db/ directory.

> However, APR's atomics can't provide any guarantees outside of a
> single process.  If the hardware/OS supports atomics, then it does,
> but the APR fallback atomic code relies upon thread mutexes.  So, we'd
> have to use some type of file lock as well - perhaps the db.lock we
> already use for recover as well?  -- justin

Okay, so no go on the atomics.  Locking is fine, but again, must be
done in libsvn_fs_base (not libsvn_repos).  In other words, we do this
at the lowest reasonable level at which a program might hook into our
Berkeley DB environment.  Obviously, we can't do anything about
programs that try to read our tables directly (bypassing our APIs)
... but shame on those program anyway.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Subversion's use of Berkeley DB [#11511]

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Wednesday, December 8, 2004 12:15 PM -0600 "C. Michael Pilato" 
<cm...@collab.net> wrote:

> My suggestion is that libsvn_fs_base grows the same serialization that
> mod_db4 uses, which is based around the use of a shared memory segment
> with a reference count in it.  Can we make use of apr_atomics for
> something like this?

So, would we add a named shared memory segment that resides underneath the 
repository - i.e. the locks subdirectory?  If so, then, yes apr's shmem 
routines would be able to map it in.

However, APR's atomics can't provide any guarantees outside of a single 
process.  If the hardware/OS supports atomics, then it does, but the APR 
fallback atomic code relies upon thread mutexes.  So, we'd have to use some 
type of file lock as well - perhaps the db.lock we already use for recover as 
well?  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Subversion's use of Berkeley DB [#11511]

Posted by "C. Michael Pilato" <cm...@collab.net>.
Keith Bostic <bo...@abyssinian.sleepycat.com> writes:

>    As a springboard for that discussion, I propose we find a
>    serialization point for all threads of control using a
>    Subversion repository so we can determine if a thread of
>    control is the first thread of control entering the database
>    environment after a possible application or system failure.

My suggestion is that libsvn_fs_base grows the same serialization that
mod_db4 uses, which is based around the use of a shared memory segment
with a reference count in it.  Can we make use of apr_atomics for
something like this?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Subversion's use of Berkeley DB [#11511]

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
Keith Bostic wrote:

> 2. Subversion users are occasionally seeing "out of memory
>    errors".  The Subversion code has recently added an error
>    callback routine, so future occurrences of this problem
>    should result in the detailed Berkeley DB error message being
>    available for later debugging.
> 
>    Given the default 256KB cache size, and using, for example,
>    16KB database page sizes, 8 threads of control in the
>    database at the same time, each grabbing 2 pages, will run
>    the cache out of room, resulting in this failure.  So,
>    increasing the cache size may very well fix this problem.
> 
>    Action Items:
>    None at this time.

So you're saying that the amount of cache used is proportional to the 
number of concurrent threads accessing the db, and if you run out of 
cache things just don't work?  That seems less than optimal...  Is there 
anyway to make it allocate more room to the cache dynamically?

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Subversion's use of Berkeley DB [#11511]

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Wednesday, December 8, 2004 12:38 PM -0500 Keith Bostic 
<bo...@abyssinian.sleepycat.com> wrote:

> 1. The Subversion code is not setting the Berkeley DB cache size.
>    Given Berkeley DB's small default cache size (256KB), and the
>    expected good locality of reference for Subversion queries,
>    I think Subversion will be able to increase performance by
>    setting the cache size.
>
>    You can set the cache in the DB_CONFIG file, or by using
>    the DbEnv::set_cachesize method:
>
>    http://www.sleepycat.com/docs/api_c/env_set_cachesize.html
>
>    For more information, see the "Selecting a cache size"
>    section of the Berkeley DB Reference Guide, included in your
>    download package and also available at:
>
>    http://www.sleepycat.com/docs/ref/am_conf/cachesize.html
>
>    Action Items:
>    Investigate the efficiency of the current Subversion cache
>    (using the Berkeley DB db_stat utility), and see if there's
>    benefit to be had by increasing the cache size.
>
>    Change Subversion to specify a cache size whenever creating
>    a Berkeley DB database environment.

The question I have is what's an appropriate cache size?  1M?  2M? 8M?  128M?

Can we change the cache size by just tweaking DB_CONFIG and restarting the 
processes?  Or, do we need to rebuild the database?  (The docs I can find 
aren't very helpful on this.)

If it helps, here's the db_stat -m output from our (svn.apache.org) install 
using BDB 4.2:

<http://www.apache.org/~jerenkrantz/bdb-db-stat.txt>

This is a fairly loaded public SVN install...  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Subversion's use of Berkeley DB [#11511]

Posted by "C. Michael Pilato" <cm...@collab.net>.
Justin Erenkrantz <ju...@erenkrantz.com> writes:

> How did George resolve this for mod_db4?  (This doesn't directly help
> ra_svn or SSH tunneling though, but can provide us with some
> insights.)

George said (in a private mail to me last week):

   I use a shared-memory hash (the mm_hash.[ch] implementation which sits
   on top of libmm, but which could sit on something else) that tracks
   the reference count on the file.  My wrapper around the open()
   function in the DB_ENV struct then looks like this:
   
   static int new_db_env_open(DB_ENV *dbenv, const char *db_home,
   u_int32_t flags, int mode)
   {
        int ret =666;
        DB_ENV *cached_dbenv;
        flags |= DB_INIT_MPOOL;
        /* if global ref count is 0, open for recovery */
        if(global_ref_count_get(db_home) == 0) {
            flags |= DB_RECOVER;
            flags |= DB_INIT_TXN;
            flags |= DB_CREATE;
        }
        if((cached_dbenv = retrieve_db_env(db_home)) != NULL) {
            memcpy(dbenv, cached_dbenv, sizeof(DB_ENV));
            ret = 0;
        }
        else if((ret = old_db_env_open(dbenv, db_home, flags, mode)) == 0) {
            register_db_env(dbenv);
        }
        return ret;
   }
   
   If you have a single DBM file for a given subversion instance (I don't
   know how svn exactly works internally), you can also just use a sysv
   semaphore.  The reason I didn't use that in mod_db4 is that it needed
   to be able to support simultaneously managing an arbitrary number of
   DB_ENVs.
   
   I hope that helps, and I'm happy to participate further in the
   discussion if that didn't fully answer your question.
   
   George

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Subversion's use of Berkeley DB [#11511]

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Wednesday, December 8, 2004 12:38 PM -0500 Keith Bostic 
<bo...@abyssinian.sleepycat.com> wrote:

>    As a springboard for that discussion, I propose we find a
>    serialization point for all threads of control using a
>    Subversion repository so we can determine if a thread of
>    control is the first thread of control entering the database
>    environment after a possible application or system failure.

Well, for mod_dav_svn access (WebDAV), we can have an Apache hook run on 
initialization before httpd starts serving pages.  So, the only thing we'd 
need to do is figure out if some other process (or system) crash occurred that 
left it in a potentially goofy (but non-detectable??) state.

Don't I recall you (or someone else) advising that we always run recovery on 
process initialization?  Would that work here?

How did George resolve this for mod_db4?  (This doesn't directly help ra_svn 
or SSH tunneling though, but can provide us with some insights.)

Thanks!  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org