You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Philip Martin <ph...@wandisco.com> on 2012/04/02 18:46:02 UTC

Subversion BDB doesn't work with Apache 2.4 event MPM

I am unable to run the Subversion regression tests for BDB and DAV with
Apache 2.4 using the event MPM (the new default MPM in 2.4).  Sometimes
mod_dav_svn SEGVs causing tests to FAIL and sometimes the tests hang
with mod_dav_svn spinning in a loop.  The problems go away when I switch
to the worker MPM (the old default MPM in 2.2).

I'm not sure what the event MPM does that causes the problem.  Here is a
typical SEGV:

#0  0x00007fb2017f8e96 in svn_error_clear (err=0x2031203920312038)
    at ../src/subversion/libsvn_subr/error.c:341
341       while (err->child)
(gdb) p err
$1 = (svn_error_t *) 0x2031203920312038
(gdb) p err[0]
Cannot access memory at address 0x2031203920312038

(gdb) bt
#0  0x00007f497ac4ae96 in svn_error_clear (err=0x2031203920312038)
    at ../src/subversion/libsvn_subr/error.c:341
#1  0x00007f497a7a83e6 in svn_fs_bdb__wrap_db (fs=0x7f49641183c0, 
    operation=0x7f497a7d3e39 "opening 'nodes' table", db_err=0)
    at ../src/subversion/libsvn_fs_base/bdb/bdb-err.c:96
#2  0x00007f497a7b7219 in open_databases (fs=0x7f49641183c0, create=0, 
    format=4, 
    path=0x1f26f70
    "/home/pm/sw/subversion/obj/subversion/tests/cmdline/svn-test-work/repositories/merge_reintegrate_tests-2/db",
    pool=0x7f495c0029b8)
    at ../src/subversion/libsvn_fs_base/fs.c:564
#3  0x00007f497a7b7c27 in base_open (fs=0x7f49641183c0, 
    path=0x1f26f70
    "/home/pm/sw/subversion/obj/subversion/tests/cmdline/svn-test-work/repositories/merge_reintegrate_tests-2/db",
    pool=0x7f495c0029b8, 
    common_pool=0x1aa8ac8) at ../src/subversion/libsvn_fs_base/fs.c:763
#4  0x00007f497b0b65a6 in svn_fs_open (fs_p=0x1f26ea8, 
    path=0x1f26f70
    "/home/pm/sw/subversion/obj/subversion/tests/cmdline/svn-test-work/repositories/merge_reintegrate_tests-2/db",
    fs_config=0x1f26d88, 
    pool=0x7f495c0029b8) at ../src/subversion/libsvn_fs/fs-loader.c:374
#5  0x00007f497b2e78c7 in get_repos (repos_p=0x7f495c03ec28, 
    path=0x7f495c00e978
    "/home/pm/sw/subversion/obj/subversion/tests/cmdline/svn-test-work/repositories/merge_reintegrate_tests-2",
    exclusive=0, 
    nonblocking=0, open_fs=1, fs_config=0x1f26d88, pool=0x7f495c0029b8)
    at ../src/subversion/libsvn_repos/repos.c:1416
#6  0x00007f497b2e7a14 in svn_repos_open2 (repos_p=0x7f495c03ec28, 
    path=0x7f495c00e978
    "/home/pm/sw/subversion/obj/subversion/tests/cmdline/svn-test-work/repositories/merge_reintegrate_tests-2",
    fs_config=0x1f26d88, 
    pool=0x7f495c0029b8) at ../src/subversion/libsvn_repos/repos.c:1462
#7  0x00007f497b51bb1c in get_resource (r=0x7f49640f7080, 
    root_path=0x7f495c00e940
    "/svn-test-work/repositories/merge_reintegrate_tests-2",
    label=0x7f49640f8667 "9", use_checked_in=0, resource=0x7f4976d30b88)
    at ../src/subversion/mod_dav_svn/repos.c:2159
#8  0x00007f497b736b74 in dav_get_resource (r=0x7f49640f7080, 
    label_allowed=<optimized out>, use_checked_in=0,
    res_p=0x7f4976d30b88)
    at /home/pm/sw/httpd/src/modules/dav/main/mod_dav.c:712

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: Subversion BDB doesn't work with Apache 2.4 event MPM

Posted by Daniel Shahaf <da...@elego.de>.
Branko Čibej wrote on Mon, Apr 02, 2012 at 21:04:36 +0200:
> On 02.04.2012 18:46, Philip Martin wrote:
> > I am unable to run the Subversion regression tests for BDB and DAV with
> > Apache 2.4 using the event MPM (the new default MPM in 2.4).  Sometimes
> > mod_dav_svn SEGVs causing tests to FAIL and sometimes the tests hang
> > with mod_dav_svn spinning in a loop.  The problems go away when I switch
> > to the worker MPM (the old default MPM in 2.2).
> >
> > I'm not sure what the event MPM does that causes the problem.  Here is a
> > typical SEGV:
> >
> > #0  0x00007fb2017f8e96 in svn_error_clear (err=0x2031203920312038)
> >     at ../src/subversion/libsvn_subr/error.c:341
> > 341       while (err->child)
> > (gdb) p err
> > $1 = (svn_error_t *) 0x2031203920312038
> > (gdb) p err[0]
> > Cannot access memory at address 0x2031203920312038
> 
> Looks like the DB_REGISTER dance biting us yet again. Pool lifetimes
> most likely, it's a tricky bit of code. Hard to say for certain since I
> don't know how the event MPM works.

http://httpd.apache.org/docs/current/mod/event.html

"""To solve [the 'keep alive problem'], this MPM uses a dedicated thread
to handle both the Listening sockets, all sockets that are in a Keep
Alive state, and sockets where the handler and protocol filters have
done their work and the only remaining thing to do is send the data to
the client"""

> 
> -- Brane
> 

Re: Subversion BDB doesn't work with Apache 2.4 event MPM

Posted by Daniel Shahaf <da...@elego.de>.
Greg Stein wrote on Thu, Apr 05, 2012 at 20:18:38 -0400:
> If we don't find the bugfix doable, then I think we can/should add the
> check that I researched above.
> 

+1

Re: Subversion BDB doesn't work with Apache 2.4 event MPM

Posted by Greg Stein <gs...@gmail.com>.
On Apr 5, 2012 4:16 PM, "Daniel Shahaf" <da...@elego.de> wrote:
>...
> Yeah, the alternative is to do it at compile-time and disallow
> the `--with-berkeley-db --with-apxs=2.4' combination --- even for people
> who don't have BDB repositories.

Not sure about this; I think the get_resource() fix would be preferable.

> My only issue with implementing this is whether it's easier to just fix
> underlying bug, in which case these version checks won't be needed.

I concur.

If we don't find the bugfix doable, then I think we can/should add the
check that I researched above.

Cheers,
-g

Re: Subversion BDB doesn't work with Apache 2.4 event MPM

Posted by Daniel Shahaf <da...@elego.de>.
Greg Stein wrote on Thu, Apr 05, 2012 at 15:02:35 -0400:
> On Thu, Apr 5, 2012 at 14:46, Greg Stein <gs...@gmail.com> wrote:
> > On Thu, Apr 5, 2012 at 14:21, Daniel Shahaf <da...@elego.de> wrote:
> >> Greg Stein wrote on Thu, Apr 05, 2012 at 14:14:53 -0400:
> >>> On Apr 5, 2012 2:10 PM, "Daniel Shahaf" <da...@elego.de> wrote:
> >>> >
> >>> > Greg Stein wrote on Thu, Apr 05, 2012 at 14:05:31 -0400:
> >>> > > On Apr 5, 2012 6:46 AM, "Philip Martin" <ph...@wandisco.com>
> >>> wrote:
> >>> > ....
> >>> > > >
> >>> > > > I've raised issue 4157 to track this problem.
> >>> > >
> >>> > > Can we at least implement a detection for this situation and fail to
> >>> start?
> >>> > > Maybe mod_dav_svn would query the mpm type, and the fs type. We could at
> >>> > > least backport that to 1.7.x pretty quickly.
> >>> > >
> >>> >
> >>> > No need.  1.7.x doesn't compile against 2.4 for other reasons (API
> >>> > changes).  I've patched them on trunk but vetoed the backport pending
> >>> > resolution of the BDB hangs.
> >>>
> >>> That prevents somebody from using BDB with the 2.4 worker mpm.
> >>
> >> Philip reported seeing tests pass with unmodified HEAD on worker.
> >>
> >> Anyway I won't object to adding a version check on httpd (at configure-
> >> or run-time).  We know it'll hang..
> >
> > It would need to be at run-time since the MPM type and the FS type is
> > not known at configuration time.
> >
> > Oh, shoot... there might be multiple repositories, with mixed types.
> > Thus, it is really a per-request check. I wonder if we can even ask
> > the FS what type it is without opening/allocating this per-thread
> > stuff. Even if we could, it would imply needing to send a 500 response
> > back to the client. That's not a very nice situation :-/
> >

Yeah, the alternative is to do it at compile-time and disallow
the `--with-berkeley-db --with-apxs=2.4' combination --- even for people
who don't have BDB repositories.

My only issue with implementing this is whether it's easier to just fix
underlying bug, in which case these version checks won't be needed.

> > Ah. I see it: svn_fs_type(). A call to that in
> > mod_dav_svn/repos.c:get_resource(), and we can bail out with an error.
> > I'm just not sure what values to examine in some ap_mpm_query() calls
> > (ref: ap_mpm.h). There may be a single query (eg. AP_MPMQ_IS_ASYNC)
> > that identifies the event MPM.
> 
> Found the best way: strcmp(ap_show_mpm(), "event")

Re: Subversion BDB doesn't work with Apache 2.4 event MPM

Posted by Greg Stein <gs...@gmail.com>.
On Thu, Apr 5, 2012 at 14:46, Greg Stein <gs...@gmail.com> wrote:
> On Thu, Apr 5, 2012 at 14:21, Daniel Shahaf <da...@elego.de> wrote:
>> Greg Stein wrote on Thu, Apr 05, 2012 at 14:14:53 -0400:
>>> On Apr 5, 2012 2:10 PM, "Daniel Shahaf" <da...@elego.de> wrote:
>>> >
>>> > Greg Stein wrote on Thu, Apr 05, 2012 at 14:05:31 -0400:
>>> > > On Apr 5, 2012 6:46 AM, "Philip Martin" <ph...@wandisco.com>
>>> wrote:
>>> > ....
>>> > > >
>>> > > > I've raised issue 4157 to track this problem.
>>> > >
>>> > > Can we at least implement a detection for this situation and fail to
>>> start?
>>> > > Maybe mod_dav_svn would query the mpm type, and the fs type. We could at
>>> > > least backport that to 1.7.x pretty quickly.
>>> > >
>>> >
>>> > No need.  1.7.x doesn't compile against 2.4 for other reasons (API
>>> > changes).  I've patched them on trunk but vetoed the backport pending
>>> > resolution of the BDB hangs.
>>>
>>> That prevents somebody from using BDB with the 2.4 worker mpm.
>>
>> Philip reported seeing tests pass with unmodified HEAD on worker.
>>
>> Anyway I won't object to adding a version check on httpd (at configure-
>> or run-time).  We know it'll hang..
>
> It would need to be at run-time since the MPM type and the FS type is
> not known at configuration time.
>
> Oh, shoot... there might be multiple repositories, with mixed types.
> Thus, it is really a per-request check. I wonder if we can even ask
> the FS what type it is without opening/allocating this per-thread
> stuff. Even if we could, it would imply needing to send a 500 response
> back to the client. That's not a very nice situation :-/
>
> Ah. I see it: svn_fs_type(). A call to that in
> mod_dav_svn/repos.c:get_resource(), and we can bail out with an error.
> I'm just not sure what values to examine in some ap_mpm_query() calls
> (ref: ap_mpm.h). There may be a single query (eg. AP_MPMQ_IS_ASYNC)
> that identifies the event MPM.

Found the best way: strcmp(ap_show_mpm(), "event")

Re: Subversion BDB doesn't work with Apache 2.4 event MPM

Posted by Greg Stein <gs...@gmail.com>.
On Thu, Apr 5, 2012 at 14:21, Daniel Shahaf <da...@elego.de> wrote:
> Greg Stein wrote on Thu, Apr 05, 2012 at 14:14:53 -0400:
>> On Apr 5, 2012 2:10 PM, "Daniel Shahaf" <da...@elego.de> wrote:
>> >
>> > Greg Stein wrote on Thu, Apr 05, 2012 at 14:05:31 -0400:
>> > > On Apr 5, 2012 6:46 AM, "Philip Martin" <ph...@wandisco.com>
>> wrote:
>> > ....
>> > > >
>> > > > I've raised issue 4157 to track this problem.
>> > >
>> > > Can we at least implement a detection for this situation and fail to
>> start?
>> > > Maybe mod_dav_svn would query the mpm type, and the fs type. We could at
>> > > least backport that to 1.7.x pretty quickly.
>> > >
>> >
>> > No need.  1.7.x doesn't compile against 2.4 for other reasons (API
>> > changes).  I've patched them on trunk but vetoed the backport pending
>> > resolution of the BDB hangs.
>>
>> That prevents somebody from using BDB with the 2.4 worker mpm.
>
> Philip reported seeing tests pass with unmodified HEAD on worker.
>
> Anyway I won't object to adding a version check on httpd (at configure-
> or run-time).  We know it'll hang..

It would need to be at run-time since the MPM type and the FS type is
not known at configuration time.

Oh, shoot... there might be multiple repositories, with mixed types.
Thus, it is really a per-request check. I wonder if we can even ask
the FS what type it is without opening/allocating this per-thread
stuff. Even if we could, it would imply needing to send a 500 response
back to the client. That's not a very nice situation :-/

Ah. I see it: svn_fs_type(). A call to that in
mod_dav_svn/repos.c:get_resource(), and we can bail out with an error.
I'm just not sure what values to examine in some ap_mpm_query() calls
(ref: ap_mpm.h). There may be a single query (eg. AP_MPMQ_IS_ASYNC)
that identifies the event MPM.

Cheers,
-g

Re: Subversion BDB doesn't work with Apache 2.4 event MPM

Posted by Daniel Shahaf <da...@elego.de>.
Greg Stein wrote on Thu, Apr 05, 2012 at 14:14:53 -0400:
> On Apr 5, 2012 2:10 PM, "Daniel Shahaf" <da...@elego.de> wrote:
> >
> > Greg Stein wrote on Thu, Apr 05, 2012 at 14:05:31 -0400:
> > > On Apr 5, 2012 6:46 AM, "Philip Martin" <ph...@wandisco.com>
> wrote:
> > ....
> > > >
> > > > I've raised issue 4157 to track this problem.
> > >
> > > Can we at least implement a detection for this situation and fail to
> start?
> > > Maybe mod_dav_svn would query the mpm type, and the fs type. We could at
> > > least backport that to 1.7.x pretty quickly.
> > >
> >
> > No need.  1.7.x doesn't compile against 2.4 for other reasons (API
> > changes).  I've patched them on trunk but vetoed the backport pending
> > resolution of the BDB hangs.
> 
> That prevents somebody from using BDB with the 2.4 worker mpm.

Philip reported seeing tests pass with unmodified HEAD on worker.

Anyway I won't object to adding a version check on httpd (at configure-
or run-time).  We know it'll hang..

Re: Subversion BDB doesn't work with Apache 2.4 event MPM

Posted by Philip Martin <ph...@wandisco.com>.
Greg Stein <gs...@gmail.com> writes:

> On Apr 5, 2012 2:10 PM, "Daniel Shahaf" <da...@elego.de> wrote:
>>
>> Greg Stein wrote on Thu, Apr 05, 2012 at 14:05:31 -0400:
>> > On Apr 5, 2012 6:46 AM, "Philip Martin" <ph...@wandisco.com>
> wrote:
>> ....
>> > >
>> > > I've raised issue 4157 to track this problem.
>> >
>> > Can we at least implement a detection for this situation and fail
>> > to start?  Maybe mod_dav_svn would query the mpm type, and the fs
>> > type. We could at least backport that to 1.7.x pretty quickly.

The FS type of what?  A repository?  I suppose mod_dav_svn could go
looking for repositories in the various Locations at startup, but
generally we don't get to see a repository until the first request.
Or the FS type of the FS modules?

>> No need.  1.7.x doesn't compile against 2.4 for other reasons (API
>> changes).  I've patched them on trunk but vetoed the backport pending
>> resolution of the BDB hangs.
>
> That prevents somebody from using BDB with the 2.4 worker mpm.

The 2.4 worker MPM doesn't trigger the assertion I added or any valgrind
errors so it doesn't have the problem at present.  Obviously that could
change.  The problem may well exist with 1.7 and the 2.2. event MPM, but
that MPM is 'experimental'.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: Subversion BDB doesn't work with Apache 2.4 event MPM

Posted by Greg Stein <gs...@gmail.com>.
On Apr 5, 2012 2:10 PM, "Daniel Shahaf" <da...@elego.de> wrote:
>
> Greg Stein wrote on Thu, Apr 05, 2012 at 14:05:31 -0400:
> > On Apr 5, 2012 6:46 AM, "Philip Martin" <ph...@wandisco.com>
wrote:
> ....
> > >
> > > I've raised issue 4157 to track this problem.
> >
> > Can we at least implement a detection for this situation and fail to
start?
> > Maybe mod_dav_svn would query the mpm type, and the fs type. We could at
> > least backport that to 1.7.x pretty quickly.
> >
>
> No need.  1.7.x doesn't compile against 2.4 for other reasons (API
> changes).  I've patched them on trunk but vetoed the backport pending
> resolution of the BDB hangs.

That prevents somebody from using BDB with the 2.4 worker mpm.

Re: Subversion BDB doesn't work with Apache 2.4 event MPM

Posted by Daniel Shahaf <da...@elego.de>.
Greg Stein wrote on Thu, Apr 05, 2012 at 14:05:31 -0400:
> On Apr 5, 2012 6:46 AM, "Philip Martin" <ph...@wandisco.com> wrote:
> >
> > Branko Čibej <br...@apache.org> writes:
> >
> > >> ... I think I may have identified the problem.  The patch below checks
> > >> that the error struct is allocated and released by the same thread.
> > >> With the worker MPM the assertion always passes but with the event MPM
> I
> > >> get assertion failures.  I don't fully understand the effect this would
> > >> have but I suspect it explains the problem.
> > >
> > > I think this pretty much explains what you're seeing.
> >
> > Yes.  Thread A allocates the memory, stores the address in the
> > thread-specific data, sets the refcount to one and sets up the pool
> > cleanup handler.  Thread A finishes with the request.  Thread B runs the
> > pool cleanup handler, sets the refcount to zero, frees the memory and
> > sets the *wrong* thread-specific data to NULL.  Thread A then gets the
> > free'd address from the thread-specific data and continues to use
> > it--all sorts of horrible things follow.
> >
> > I've raised issue 4157 to track this problem.
> 
> Can we at least implement a detection for this situation and fail to start?
> Maybe mod_dav_svn would query the mpm type, and the fs type. We could at
> least backport that to 1.7.x pretty quickly.
> 

No need.  1.7.x doesn't compile against 2.4 for other reasons (API
changes).  I've patched them on trunk but vetoed the backport pending
resolution of the BDB hangs.

> Cheers,
> -g

Re: Subversion BDB doesn't work with Apache 2.4 event MPM

Posted by Greg Stein <gs...@gmail.com>.
On Apr 5, 2012 6:46 AM, "Philip Martin" <ph...@wandisco.com> wrote:
>
> Branko Čibej <br...@apache.org> writes:
>
> >> ... I think I may have identified the problem.  The patch below checks
> >> that the error struct is allocated and released by the same thread.
> >> With the worker MPM the assertion always passes but with the event MPM
I
> >> get assertion failures.  I don't fully understand the effect this would
> >> have but I suspect it explains the problem.
> >
> > I think this pretty much explains what you're seeing.
>
> Yes.  Thread A allocates the memory, stores the address in the
> thread-specific data, sets the refcount to one and sets up the pool
> cleanup handler.  Thread A finishes with the request.  Thread B runs the
> pool cleanup handler, sets the refcount to zero, frees the memory and
> sets the *wrong* thread-specific data to NULL.  Thread A then gets the
> free'd address from the thread-specific data and continues to use
> it--all sorts of horrible things follow.
>
> I've raised issue 4157 to track this problem.

Can we at least implement a detection for this situation and fail to start?
Maybe mod_dav_svn would query the mpm type, and the fs type. We could at
least backport that to 1.7.x pretty quickly.

Cheers,
-g

Re: Subversion BDB doesn't work with Apache 2.4 event MPM

Posted by Philip Martin <ph...@wandisco.com>.
Branko Čibej <br...@apache.org> writes:

>> ... I think I may have identified the problem.  The patch below checks
>> that the error struct is allocated and released by the same thread.
>> With the worker MPM the assertion always passes but with the event MPM I
>> get assertion failures.  I don't fully understand the effect this would
>> have but I suspect it explains the problem.
>
> I think this pretty much explains what you're seeing.

Yes.  Thread A allocates the memory, stores the address in the
thread-specific data, sets the refcount to one and sets up the pool
cleanup handler.  Thread A finishes with the request.  Thread B runs the
pool cleanup handler, sets the refcount to zero, frees the memory and
sets the *wrong* thread-specific data to NULL.  Thread A then gets the
free'd address from the thread-specific data and continues to use
it--all sorts of horrible things follow.

I've raised issue 4157 to track this problem.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: Subversion BDB doesn't work with Apache 2.4 event MPM

Posted by Branko Čibej <br...@apache.org>.
On 05.04.2012 00:44, Philip Martin wrote:
> Philip Martin <ph...@wandisco.com> writes:
>
>> I think there is a refcount/locking bug in svn_fs_bdb__close.  This code
>>
>> -  if (0 == --bdb_baton->error_info->refcount && bdb->pool)
>> -    {
>> -      svn_error_clear(bdb_baton->error_info->pending_errors);
>> -#if APR_HAS_THREADS
>> -      free(bdb_baton->error_info);
>> -      apr_threadkey_private_set(NULL, bdb->error_info);
>> -#endif
>> -    }
>>
>> should be inside svn_fs_bdb__close_internal protected by the
>> bdb_cache_lock otherwise the error_info refcount can change while
>> another thread is inside svn_fs_bdb__open_internal and holding the lock.
> That's not correct--the error data is per-thread and so there should be
> no race.  Except ...
>
>> However moving the code from __close to __close_internal so it is inside
>> the lock doesn't stop the tests failing so there must be a second bug
>> somewhere.
> ... I think I may have identified the problem.  The patch below checks
> that the error struct is allocated and released by the same thread.
> With the worker MPM the assertion always passes but with the event MPM I
> get assertion failures.  I don't fully understand the effect this would
> have but I suspect it explains the problem.

I think this pretty much explains what you're seeing. It was years ago,
but I do recall that, when I implemented DB_REGISTER handling, I made
the internal BDB handles thread-specific, which means that you aren't
allowed to use the same handle from different threads at any time --
that's a side effect of using thread-specific storage for certain
"global" data.

If event MPM hands off requests between different threads, than either
mod_dav_svn has to make sure that only the actual worker thread opens
the FS handle, or the BDB code itself needs to change to account for the
difference between "specific to one thread" and "one thread at a time."

Sorry but I've no idea which approach would be easier, apart from having
a feeling of dread sweep over me at the idea of touching that BDB handle
management code. :)

-- Brane


Re: Subversion BDB doesn't work with Apache 2.4 event MPM

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> I think there is a refcount/locking bug in svn_fs_bdb__close.  This code
>
> -  if (0 == --bdb_baton->error_info->refcount && bdb->pool)
> -    {
> -      svn_error_clear(bdb_baton->error_info->pending_errors);
> -#if APR_HAS_THREADS
> -      free(bdb_baton->error_info);
> -      apr_threadkey_private_set(NULL, bdb->error_info);
> -#endif
> -    }
>
> should be inside svn_fs_bdb__close_internal protected by the
> bdb_cache_lock otherwise the error_info refcount can change while
> another thread is inside svn_fs_bdb__open_internal and holding the lock.

That's not correct--the error data is per-thread and so there should be
no race.  Except ...

> However moving the code from __close to __close_internal so it is inside
> the lock doesn't stop the tests failing so there must be a second bug
> somewhere.

... I think I may have identified the problem.  The patch below checks
that the error struct is allocated and released by the same thread.
With the worker MPM the assertion always passes but with the event MPM I
get assertion failures.  I don't fully understand the effect this would
have but I suspect it explains the problem.

Index: subversion/libsvn_fs_base/bdb/env.c
===================================================================
--- subversion/libsvn_fs_base/bdb/env.c	(revision 1309593)
+++ subversion/libsvn_fs_base/bdb/env.c	(working copy)
@@ -186,6 +186,7 @@ get_error_info(const bdb_env_t *bdb)
   if (!priv)
     {
       priv = calloc(1, sizeof(bdb_error_info_t));
+      ((struct bdb_error_info_t *)priv)->id = pthread_self();
       apr_threadkey_private_set(priv, bdb->error_info);
     }
   return priv;
@@ -524,6 +525,7 @@ svn_fs_bdb__close(bdb_env_baton_t *bdb_baton)
 
   SVN_ERR_ASSERT(bdb_baton->env == bdb_baton->bdb->env);
   SVN_ERR_ASSERT(bdb_baton->error_info->refcount > 0);
+  SVN_ERR_ASSERT(pthread_equal(bdb_baton->error_info->id, pthread_self()));
 
   /* Neutralize bdb_baton's pool cleanup to prevent double-close. See
      cleanup_env_baton(). */
Index: subversion/libsvn_fs_base/bdb/env.h
===================================================================
--- subversion/libsvn_fs_base/bdb/env.h	(revision 1309593)
+++ subversion/libsvn_fs_base/bdb/env.h	(working copy)
@@ -71,6 +71,8 @@ typedef struct bdb_error_info_t
      instances that refer to this object. */
   unsigned refcount;
 
+  pthread_t id;
+
 } bdb_error_info_t;
 

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: Subversion BDB doesn't work with Apache 2.4 event MPM

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> ==10434== Thread 15:
> ==10434== Invalid read of size 4
> ==10434==    at 0x802D5BB: svn_fs_bdb__open_internal (env.c:660)
> ==10434==    by 0x802D679: svn_fs_bdb__open (env.c:672)
> ==10434==    by 0x80390D7: open_databases (fs.c:536)
> ==10434==    by 0x8039C26: base_open (fs.c:763)
> ==10434==    by 0x77445A5: svn_fs_open (fs-loader.c:374)
> ==10434==    by 0x752D8C6: get_repos (repos.c:1416)
> ==10434==    by 0x752DA13: svn_repos_open2 (repos.c:1462)
> ==10434==    by 0x72EBB1B: get_resource (repos.c:2159)
> ==10434==    by 0x70B7B73: dav_get_resource (mod_dav.c:712)
> ==10434==    by 0x70BC768: dav_method_options (mod_dav.c:1602)
> ==10434==    by 0x70BDAE7: dav_handler (mod_dav.c:4706)
> ==10434==    by 0x44BBBF: ap_run_handler (config.c:169)
> ==10434==  Address 0x17a0b690 is 16 bytes inside a block of size 24 free'd
> ==10434==    at 0x4C240FD: free (vg_replace_malloc.c:366)
> ==10434==    by 0x802D0BF: svn_fs_bdb__close (env.c:539)
> ==10434==    by 0x8038AAA: cleanup_fs (fs.c:183)
> ==10434==    by 0x8038B36: cleanup_fs_apr (fs.c:289)
> ==10434==    by 0x508DBCD: apr_pool_clear (apr_pools.c:2359)
> ==10434==    by 0x669ADE3: process_lingering_close (event.c:1253)
> ==10434==    by 0x669B987: listener_thread (event.c:1485)
> ==10434==    by 0x58F18C9: start_thread (pthread_create.c:300)
> ==10434==    by 0x600286C: clone (clone.S:112)

I think there is a refcount/locking bug in svn_fs_bdb__close.  This code

-  if (0 == --bdb_baton->error_info->refcount && bdb->pool)
-    {
-      svn_error_clear(bdb_baton->error_info->pending_errors);
-#if APR_HAS_THREADS
-      free(bdb_baton->error_info);
-      apr_threadkey_private_set(NULL, bdb->error_info);
-#endif
-    }

should be inside svn_fs_bdb__close_internal protected by the
bdb_cache_lock otherwise the error_info refcount can change while
another thread is inside svn_fs_bdb__open_internal and holding the lock.

However moving the code from __close to __close_internal so it is inside
the lock doesn't stop the tests failing so there must be a second bug
somewhere.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: Subversion BDB doesn't work with Apache 2.4 event MPM

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> I'm not familiar with this code.  I can "fix" the problem by removing
> the free() from svn_fs_bdb__close but that's not really surprising given
> the warnings above, nor is it a solution.  The patch below also appears
> to fix the problem but I'm reluctant to commit it until I understand why
> this problem only occurs with the event MPM.
>
> Index: subversion/libsvn_fs_base/bdb/env.c
> ===================================================================
> --- subversion/libsvn_fs_base/bdb/env.c	(revision 1308769)
> +++ subversion/libsvn_fs_base/bdb/env.c	(working copy)
> @@ -517,6 +517,9 @@ svn_fs_bdb__close_internal(bdb_env_t *bdb)
>    return svn_error_trace(err);
>  }
>  
> +static apr_status_t
> +cleanup_env_baton(void *data);
> +
>  svn_error_t *
>  svn_fs_bdb__close(bdb_env_baton_t *bdb_baton)
>  {
> @@ -524,6 +527,9 @@ svn_fs_bdb__close(bdb_env_baton_t *bdb_baton)
>  
>    SVN_ERR_ASSERT(bdb_baton->env == bdb_baton->bdb->env);
>  
> +  apr_pool_cleanup_kill(bdb_baton->pool_for_cleanup, bdb_baton,
> +                        cleanup_env_baton);
> +
>    /* Neutralize bdb_baton's pool cleanup to prevent double-close. See
>       cleanup_env_baton(). */
>    bdb_baton->bdb = NULL;
> @@ -660,6 +666,7 @@ svn_fs_bdb__open_internal(bdb_env_baton_t **bdb_ba
>    ++(*bdb_batonp)->error_info->refcount;
>    apr_pool_cleanup_register(pool, *bdb_batonp, cleanup_env_baton,
>                              apr_pool_cleanup_null);
> +  (*bdb_batonp)->pool_for_cleanup = pool;
>  
>    return SVN_NO_ERROR;
>  }
> Index: subversion/libsvn_fs_base/bdb/env.h
> ===================================================================
> --- subversion/libsvn_fs_base/bdb/env.h	(revision 1308769)
> +++ subversion/libsvn_fs_base/bdb/env.h	(working copy)
> @@ -86,6 +86,9 @@ typedef struct bdb_env_baton_t
>  
>    /* The error info related to this baton. */
>    bdb_error_info_t *error_info;
> +
> +  apr_pool_t *pool_for_cleanup;
> +
>  } bdb_env_baton_t;

That makes the tests run a bit further but they still fail, valgrind
shows:

==22500== Thread 15:
==22500== Invalid read of size 4
==22500==    at 0x802D639: svn_fs_bdb__open_internal (env.c:663)
==22500==    by 0x802D706: svn_fs_bdb__open (env.c:676)
==22500==    by 0x8039167: open_databases (fs.c:536)
==22500==    by 0x8039CB6: base_open (fs.c:763)
==22500==    by 0x77445A5: svn_fs_open (fs-loader.c:374)
==22500==    by 0x752D8C6: get_repos (repos.c:1416)
==22500==    by 0x752DA13: svn_repos_open2 (repos.c:1462)
==22500==    by 0x72EBB1B: get_resource (repos.c:2159)
==22500==    by 0x70B7B73: dav_get_resource (mod_dav.c:712)
==22500==    by 0x70B8FB1: dav_method_propfind (mod_dav.c:2010)
==22500==    by 0x70BDAF7: dav_handler (mod_dav.c:4710)
==22500==    by 0x44BBBF: ap_run_handler (config.c:169)
==22500==  Address 0x184d9500 is 16 bytes inside a block of size 24 free'd
==22500==    at 0x4C240FD: free (vg_replace_malloc.c:366)
==22500==    by 0x802D179: svn_fs_bdb__close (env.c:555)
==22500==    by 0x8038B3A: cleanup_fs (fs.c:183)
==22500==    by 0x8038BC6: cleanup_fs_apr (fs.c:289)
==22500==    by 0x508DBCD: apr_pool_clear (apr_pools.c:2359)
==22500==    by 0x669ADE3: process_lingering_close (event.c:1253)
==22500==    by 0x669B987: listener_thread (event.c:1485)
==22500==    by 0x58F18C9: start_thread (pthread_create.c:300)
==22500==    by 0x600286C: clone (clone.S:112)
==22500== 

Which looks like a similar problem with another pool cleanup at a higher
level.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: Subversion BDB doesn't work with Apache 2.4 event MPM

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> ==10434== Thread 15:
> ==10434== Invalid read of size 4
> ==10434==    at 0x802D5BB: svn_fs_bdb__open_internal (env.c:660)
> ==10434==    by 0x802D679: svn_fs_bdb__open (env.c:672)
> ==10434==    by 0x80390D7: open_databases (fs.c:536)
> ==10434==    by 0x8039C26: base_open (fs.c:763)
> ==10434==    by 0x77445A5: svn_fs_open (fs-loader.c:374)
> ==10434==    by 0x752D8C6: get_repos (repos.c:1416)
> ==10434==    by 0x752DA13: svn_repos_open2 (repos.c:1462)
> ==10434==    by 0x72EBB1B: get_resource (repos.c:2159)
> ==10434==    by 0x70B7B73: dav_get_resource (mod_dav.c:712)
> ==10434==    by 0x70BC768: dav_method_options (mod_dav.c:1602)
> ==10434==    by 0x70BDAE7: dav_handler (mod_dav.c:4706)
> ==10434==    by 0x44BBBF: ap_run_handler (config.c:169)
> ==10434==  Address 0x17a0b690 is 16 bytes inside a block of size 24 free'd
> ==10434==    at 0x4C240FD: free (vg_replace_malloc.c:366)
> ==10434==    by 0x802D0BF: svn_fs_bdb__close (env.c:539)
> ==10434==    by 0x8038AAA: cleanup_fs (fs.c:183)
> ==10434==    by 0x8038B36: cleanup_fs_apr (fs.c:289)
> ==10434==    by 0x508DBCD: apr_pool_clear (apr_pools.c:2359)
> ==10434==    by 0x669ADE3: process_lingering_close (event.c:1253)
> ==10434==    by 0x669B987: listener_thread (event.c:1485)
> ==10434==    by 0x58F18C9: start_thread (pthread_create.c:300)
> ==10434==    by 0x600286C: clone (clone.S:112)

I'm not familiar with this code.  I can "fix" the problem by removing
the free() from svn_fs_bdb__close but that's not really surprising given
the warnings above, nor is it a solution.  The patch below also appears
to fix the problem but I'm reluctant to commit it until I understand why
this problem only occurs with the event MPM.

Index: subversion/libsvn_fs_base/bdb/env.c
===================================================================
--- subversion/libsvn_fs_base/bdb/env.c	(revision 1308769)
+++ subversion/libsvn_fs_base/bdb/env.c	(working copy)
@@ -517,6 +517,9 @@ svn_fs_bdb__close_internal(bdb_env_t *bdb)
   return svn_error_trace(err);
 }
 
+static apr_status_t
+cleanup_env_baton(void *data);
+
 svn_error_t *
 svn_fs_bdb__close(bdb_env_baton_t *bdb_baton)
 {
@@ -524,6 +527,9 @@ svn_fs_bdb__close(bdb_env_baton_t *bdb_baton)
 
   SVN_ERR_ASSERT(bdb_baton->env == bdb_baton->bdb->env);
 
+  apr_pool_cleanup_kill(bdb_baton->pool_for_cleanup, bdb_baton,
+                        cleanup_env_baton);
+
   /* Neutralize bdb_baton's pool cleanup to prevent double-close. See
      cleanup_env_baton(). */
   bdb_baton->bdb = NULL;
@@ -660,6 +666,7 @@ svn_fs_bdb__open_internal(bdb_env_baton_t **bdb_ba
   ++(*bdb_batonp)->error_info->refcount;
   apr_pool_cleanup_register(pool, *bdb_batonp, cleanup_env_baton,
                             apr_pool_cleanup_null);
+  (*bdb_batonp)->pool_for_cleanup = pool;
 
   return SVN_NO_ERROR;
 }
Index: subversion/libsvn_fs_base/bdb/env.h
===================================================================
--- subversion/libsvn_fs_base/bdb/env.h	(revision 1308769)
+++ subversion/libsvn_fs_base/bdb/env.h	(working copy)
@@ -86,6 +86,9 @@ typedef struct bdb_env_baton_t
 
   /* The error info related to this baton. */
   bdb_error_info_t *error_info;
+
+  apr_pool_t *pool_for_cleanup;
+
 } bdb_env_baton_t;
 

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: Subversion BDB doesn't work with Apache 2.4 event MPM

Posted by Philip Martin <ph...@wandisco.com>.
Branko Čibej <br...@apache.org> writes:

> On 02.04.2012 18:46, Philip Martin wrote:
>> I am unable to run the Subversion regression tests for BDB and DAV with
>> Apache 2.4 using the event MPM (the new default MPM in 2.4).  Sometimes
>> mod_dav_svn SEGVs causing tests to FAIL and sometimes the tests hang
>> with mod_dav_svn spinning in a loop.  The problems go away when I switch
>> to the worker MPM (the old default MPM in 2.2).
>>
>> I'm not sure what the event MPM does that causes the problem.  Here is a
>> typical SEGV:
>>
>> #0  0x00007fb2017f8e96 in svn_error_clear (err=0x2031203920312038)
>>     at ../src/subversion/libsvn_subr/error.c:341
>> 341       while (err->child)
>> (gdb) p err
>> $1 = (svn_error_t *) 0x2031203920312038
>> (gdb) p err[0]
>> Cannot access memory at address 0x2031203920312038
>
> Looks like the DB_REGISTER dance biting us yet again. Pool lifetimes
> most likely, it's a tricky bit of code. Hard to say for certain since I
> don't know how the event MPM works.

I normally use pool debugging with valgrind as that catches more and
earlier errors.  This causes almost all the tests to FAIL without
Subversion appearing the stack trace:

==28130== Thread 28:
==28130== Invalid read of size 8
==28130==    at 0x669CE1B: process_lingering_close (event.c:1254)
==28130==    by 0x669DA9F: listener_thread (event.c:1485)
==28130==    by 0x58F38C9: start_thread (pthread_create.c:300)
==28130==    by 0x600486C: clone (clone.S:112)
==28130==  Address 0x63c4150 is 32 bytes inside a block of size 88 free'd
==28130==    at 0x4C240FD: free (vg_replace_malloc.c:366)
==28130==    by 0x508D7E8: pool_clear_debug (apr_pools.c:1588)
==28130==    by 0x508DA61: apr_pool_clear_debug (apr_pools.c:1625)
==28130==    by 0x669CE1A: process_lingering_close (event.c:1253)
==28130==    by 0x669DA9F: listener_thread (event.c:1485)
==28130==    by 0x58F38C9: start_thread (pthread_create.c:300)
==28130==    by 0x600486C: clone (clone.S:112)
==28130== 
==28130== Invalid read of size 8
==28130==    at 0x508D0FD: pool_alloc (apr_pools.c:1482)
==28130==    by 0x66A0ECD: ap_push_pool (fdqueue.c:235)
==28130==    by 0x669CE2A: process_lingering_close (event.c:1254)
==28130==    by 0x669DA9F: listener_thread (event.c:1485)
==28130==    by 0x58F38C9: start_thread (pthread_create.c:300)
==28130==    by 0x600486C: clone (clone.S:112)
==28130==  Address 0x41414141414141a1 is not stack'd, malloc'd or (recently) free'd

Without pool debugging valgrind gives:

==10434== Thread 15:
==10434== Invalid read of size 4
==10434==    at 0x802D5BB: svn_fs_bdb__open_internal (env.c:660)
==10434==    by 0x802D679: svn_fs_bdb__open (env.c:672)
==10434==    by 0x80390D7: open_databases (fs.c:536)
==10434==    by 0x8039C26: base_open (fs.c:763)
==10434==    by 0x77445A5: svn_fs_open (fs-loader.c:374)
==10434==    by 0x752D8C6: get_repos (repos.c:1416)
==10434==    by 0x752DA13: svn_repos_open2 (repos.c:1462)
==10434==    by 0x72EBB1B: get_resource (repos.c:2159)
==10434==    by 0x70B7B73: dav_get_resource (mod_dav.c:712)
==10434==    by 0x70BC768: dav_method_options (mod_dav.c:1602)
==10434==    by 0x70BDAE7: dav_handler (mod_dav.c:4706)
==10434==    by 0x44BBBF: ap_run_handler (config.c:169)
==10434==  Address 0x17a0b690 is 16 bytes inside a block of size 24 free'd
==10434==    at 0x4C240FD: free (vg_replace_malloc.c:366)
==10434==    by 0x802D0BF: svn_fs_bdb__close (env.c:539)
==10434==    by 0x8038AAA: cleanup_fs (fs.c:183)
==10434==    by 0x8038B36: cleanup_fs_apr (fs.c:289)
==10434==    by 0x508DBCD: apr_pool_clear (apr_pools.c:2359)
==10434==    by 0x669ADE3: process_lingering_close (event.c:1253)
==10434==    by 0x669B987: listener_thread (event.c:1485)
==10434==    by 0x58F18C9: start_thread (pthread_create.c:300)
==10434==    by 0x600286C: clone (clone.S:112)
==10434== 
==10434== Invalid write of size 4
==10434==    at 0x802D5C1: svn_fs_bdb__open_internal (env.c:660)
==10434==    by 0x802D679: svn_fs_bdb__open (env.c:672)
==10434==    by 0x80390D7: open_databases (fs.c:536)
==10434==    by 0x8039C26: base_open (fs.c:763)
==10434==    by 0x77445A5: svn_fs_open (fs-loader.c:374)
==10434==    by 0x752D8C6: get_repos (repos.c:1416)
==10434==    by 0x752DA13: svn_repos_open2 (repos.c:1462)
==10434==    by 0x72EBB1B: get_resource (repos.c:2159)
==10434==    by 0x70B7B73: dav_get_resource (mod_dav.c:712)
==10434==    by 0x70BC768: dav_method_options (mod_dav.c:1602)
==10434==    by 0x70BDAE7: dav_handler (mod_dav.c:4706)
==10434==    by 0x44BBBF: ap_run_handler (config.c:169)
==10434==  Address 0x17a0b690 is 16 bytes inside a block of size 24 free'd
==10434==    at 0x4C240FD: free (vg_replace_malloc.c:366)
==10434==    by 0x802D0BF: svn_fs_bdb__close (env.c:539)
==10434==    by 0x8038AAA: cleanup_fs (fs.c:183)
==10434==    by 0x8038B36: cleanup_fs_apr (fs.c:289)
==10434==    by 0x508DBCD: apr_pool_clear (apr_pools.c:2359)
==10434==    by 0x669ADE3: process_lingering_close (event.c:1253)
==10434==    by 0x669B987: listener_thread (event.c:1485)
==10434==    by 0x58F18C9: start_thread (pthread_create.c:300)
==10434==    by 0x600286C: clone (clone.S:112)


-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: Subversion BDB doesn't work with Apache 2.4 event MPM

Posted by Branko Čibej <br...@apache.org>.
On 02.04.2012 18:46, Philip Martin wrote:
> I am unable to run the Subversion regression tests for BDB and DAV with
> Apache 2.4 using the event MPM (the new default MPM in 2.4).  Sometimes
> mod_dav_svn SEGVs causing tests to FAIL and sometimes the tests hang
> with mod_dav_svn spinning in a loop.  The problems go away when I switch
> to the worker MPM (the old default MPM in 2.2).
>
> I'm not sure what the event MPM does that causes the problem.  Here is a
> typical SEGV:
>
> #0  0x00007fb2017f8e96 in svn_error_clear (err=0x2031203920312038)
>     at ../src/subversion/libsvn_subr/error.c:341
> 341       while (err->child)
> (gdb) p err
> $1 = (svn_error_t *) 0x2031203920312038
> (gdb) p err[0]
> Cannot access memory at address 0x2031203920312038

Looks like the DB_REGISTER dance biting us yet again. Pool lifetimes
most likely, it's a tricky bit of code. Hard to say for certain since I
don't know how the event MPM works.

-- Brane