You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by James McCoy <ja...@debian.org> on 2015/05/20 06:19:00 UTC

Re: Segfault in Perl bindings when commit touches a large number of files

On Tue, Apr 07, 2015 at 12:01:28PM +0100, Philip Martin wrote:
> Roderich Schupp <ro...@gmail.com> writes:
> 
> > On Monday, March 23, 2015 at 1:35:49 PM UTC+1, Philip Martin wrote:
> >>
> >> This makes svn_swig_pl_from_md5 follow the same pattern as
> >> svn_swig_pl_from_stream.  I've committed this to trunk as r1668618.
> 
> > Using Swig's %append_output is the correct way. I fixed another occurrence
> > of the above pattern in r1671388.
> 
> Both of these are now in the STATUS file for 1.9, 1.8 and 1.7.

I'm still experiencing the crash in 1.9.0-rc1 unless I apply the
original change I suggested to the generated
subversion/bindings/swig/perl/native/svn_delta.c.  I'm not sure how that
specific part of the file is generated, so I'll just attach a diff of
the generated file.

Cheers,
-- 
James
GPG Key: 4096R/331BA3DB 2011-12-05 James McCoy <ja...@debian.org>

Re: Segfault in Perl bindings when commit touches a large number of files

Posted by Roderich Schupp <ro...@gmail.com>.
On Wed, Jun 3, 2015 at 5:54 PM, Roderich Schupp <ro...@gmail.com>
wrote:

> On Wed, May 27, 2015 at 7:56 PM, Roderich Schupp <
> roderich.schupp@gmail.com> wrote:
>
>> I've dug a little deeper and think I've found a serious flaw in how the
>> Perl bindings handle
>> the Perl arguments and return values stack.
>>
>
> I've committed a series of patches (r1683261:1683271) to address this
> problem:
>

However... This fixes only the obvious cause why the local Perl call stack
of a Swig generated
wrapper may get out of sync with the global Perl stack: calling a helper
function that
(transitively) calls back into Perl.
But things may get more complicated. The wrapped Subversion function itself
may call
a callback function that is actually a Perl sub (via some thunkery).
For real life examples, just run some tests in
subversion/binding/swig/perl/native under GDB.
Since "calling back into Perl" always happens by calling
svn_swig_pl_callback_thunk,
you can simply set a breakpoint there, look upwards in the C call stack
for a function named _wrap_svn_* , then check the function immediately
below it:

This is from running "perl -Mblib t/1repost.t":

Breakpoint 1, svn_swig_pl_callback_thunk (
    caller_func=caller_func@entry=CALL_SV, func=func@entry=0x846bd0,
    result=result@entry=0x0, fmt=fmt@entry=0x7ffff4704f33 "srS")
    at subversion/bindings/swig/perl/libsvn_swig_perl/swigutil_pl.c:528
528     {
(gdb) where
#0  svn_swig_pl_callback_thunk (caller_func=caller_func@entry=CALL_SV,
    func=func@entry=0x846bd0, result=result@entry=0x0,
    fmt=fmt@entry=0x7ffff4704f33 "srS")
    at subversion/bindings/swig/perl/libsvn_swig_perl/swigutil_pl.c:528
#1  0x00007ffff4702c99 in svn_swig_pl_thunk_history_func (
    baton=baton@entry=0x846bd0, path=0x7ffff7ef1100 "/tags/foo/filea",
    revision=2, pool=pool@entry=0x7ffff7ef1028)
    at subversion/bindings/swig/perl/libsvn_swig_perl/swigutil_pl.c:1093
#2  0x00007ffff511935a in svn_repos_history2 (fs=fs@entry=0x7ffff7ff0428,
    path=path@entry=0xd5d6b0 "tags/foo/filea",
    history_func=0x7ffff4702c30 <svn_swig_pl_thunk_history_func>,
    history_baton=history_baton@entry=0x846bd0,
    authz_read_func=authz_read_func@entry=0x0,
    authz_read_baton=authz_read_baton@entry=0x0, start=0, end=<optimized
out>,
    cross_copies=1, pool=0x7ffff7fef028)
    at subversion/libsvn_repos/rev_hunt.c:275
#3  0x00007ffff51044bc in svn_repos_history (fs=fs@entry=0x7ffff7ff0428,
<======
    path=path@entry=0xd5d6b0 "tags/foo/filea", history_func=<optimized
out>,
    history_baton=history_baton@entry=0x846bd0, start=start@entry=0,
    end=end@entry=2, cross_copies=1, pool=0x7ffff7fef028)
    at subversion/libsvn_repos/deprecated.c:585
#4  0x00007ffff03bdb6e in _wrap_svn_repos_history (my_perl=<optimized
out>,
    cv=<optimized out>) at svn_repos.c:8974
#5  0x00000000004bd3fa in Perl_pp_entersub (my_perl=0x7d1010) at
pp_hot.c:3270
#6  0x00000000004b6296 in Perl_runops_standard (my_perl=0x7d1010) at
run.c:41
#7  0x00000000004439b9 in S_run_body (oldscope=1, my_perl=0x7d1010)
    at perl.c:2448
#8  perl_run (my_perl=0x7d1010) at perl.c:2371
#9  0x000000000041cbdb in main (argc=3, argv=0x7fffffffe228,
    env=0x7fffffffe248) at perlmain.c:116

In the wrapper (_wrap_svn_repos_history) this is the line

{
      result = (svn_error_t *)svn_repos_history(arg1,(char const
*)arg2,arg3,arg4,arg5,arg6,arg7,arg8);
}

so this call should be bracketed (though I don't know how to tell Swig)

{
      PUTBACK;
      result = (svn_error_t *)svn_repos_history(arg1,(char const
*)arg2,arg3,arg4,arg5,arg6,arg7,arg8);
      SPAGAIN;
}


Here's another one from running "perl -Mblib t/7editor.t":

Breakpoint 1, svn_swig_pl_callback_thunk (
    caller_func=caller_func@entry=CALL_SV, func=0xf0a618,
    result=result@entry=0x0, fmt=fmt@entry=0x7ffff491e064 "rss")
    at subversion/bindings/swig/perl/libsvn_swig_perl/swigutil_pl.c:528
528     {
(gdb) where
#0  svn_swig_pl_callback_thunk (caller_func=caller_func@entry=CALL_SV,
    func=0xf0a618, result=result@entry=0x0, fmt=fmt@entry=0x7ffff491e064
"rss")
    at subversion/bindings/swig/perl/libsvn_swig_perl/swigutil_pl.c:528
#1  0x00007ffff491be2e in svn_swig_pl_thunk_commit_callback (
    new_revision=<optimized out>, date=<optimized out>,
    author=<optimized out>, baton=<optimized out>)
    at subversion/bindings/swig/perl/libsvn_swig_perl/swigutil_pl.c:1132
#2  0x00007ffff53195f3 in invoke_commit_cb (
    commit_cb=0x7ffff4b69490 <commit_wrapper_callback>,
    commit_baton=0x7ffff7fcc1b0, fs=0x7ffff7ff0428, revision=1,
    post_commit_errstr=post_commit_errstr@entry=0x0,
    scratch_pool=scratch_pool@entry=0x7ffff7fef028)
    at subversion/libsvn_repos/commit.c:232
#3  0x00007ffff531a57d in close_edit (edit_baton=0x7ffff7fe70a0,
    pool=0x7ffff7fef028) at subversion/libsvn_repos/commit.c:875
#4  0x00007ffff013b7db in svn_delta_editor_invoke_close_edit (    <======
    _obj=0x7ffff7fcc3a0, scratch_pool=0x7ffff7fef028,
    edit_baton=<optimized out>) at svn_delta.c:2046
#5  _wrap_svn_delta_editor_invoke_close_edit (my_perl=<optimized out>,
    cv=<optimized out>) at svn_delta.c:7253
#6  0x00000000004bd3fa in Perl_pp_entersub (my_perl=0x7d1010) at
pp_hot.c:3270
#7  0x00000000004b6296 in Perl_runops_standard (my_perl=0x7d1010) at
run.c:41
#8  0x00000000004439b9 in S_run_body (oldscope=1, my_perl=0x7d1010)
    at perl.c:2448
#9  perl_run (my_perl=0x7d1010) at perl.c:2371
#10 0x000000000041cbdb in main (argc=3, argv=0x7fffffffe228,
    env=0x7fffffffe248) at perlmain.c:116


Cheers, Roderich

Re: Segfault in Perl bindings when commit touches a large number of files

Posted by Roderich Schupp <ro...@gmail.com>.
On Wed, May 27, 2015 at 7:56 PM, Roderich Schupp <ro...@gmail.com>
wrote:

> I've dug a little deeper and think I've found a serious flaw in how the
> Perl bindings handle
> the Perl arguments and return values stack.
>

I've committed a series of patches (r1683261:1683271) to address this
problem:

   - identify helper functions that (transitively) may call back into Perl
   - bracket all calls to these functions in Swig rules with PUTBACK/SPAGAIN
   (or only SPAGAIN for "in" typemaps, as the local stack pointer does not
   "move" during input parameter handling)

Cheers, Roderich

Re: Segfault in Perl bindings when commit touches a large number of files

Posted by Roderich Schupp <ro...@gmail.com>.
On Sun, May 24, 2015 at 2:51 AM, James McCoy <ja...@debian.org> wrote:

> The patch is working for me as well.
>

I've dug a little deeper and think I've found a serious flaw in how the
Perl bindings handle
the Perl arguments and return values stack.

Swig generates for Perl what is known (in the Perl world) as XS code:
basically C source with a heavy dose of macros and functions from the Perl
internals.
At issue here is the Perl arguments and return values stack, it's an array
where the (Perl) arguments for the generated function are found and
the (Perl) return values will be stored. The stack pointer is a complex
expression, so for performance reasons it's value is stored into a local
variable of the function (that's what the "dSP" does that Swig puts at the
start
of every generated function). All other macros that deal with the stack,
e.g. ST, XPUSHs, EXTEND, use this "local stack pointer".

This works under the assumption that the "global stack pointer" doesn't
change until the function returns. This assumption is violated when the
XS code "calls back" into Perl (using one of the Perl internal functions
call_method, call_sv or call_pv). During the call back, the Perl argument
stack might get expanded (i.e. reallocated), so local and global stack
pointers
get out of sync. You must be synchronize them right before and after
calling any call_* (using macros PUTBACK and SPAGAIN).
See the perlcall man page <http://perldoc.perl.org/perlcall.html> for the
gory details.

*No* Swig rules for the Perl bindings generate calls to any call_* function,
so no problem here. There is one universal wrapper for call_sv and
call_method:
function svn_swig_pl_callback_thunk in the Perl bindings helper library,
though.
It's itself XS code and AFAICT employs the correct magic around the actual
call of call_method or call_sv. This function *is* called in Swig rules.

But it does call back into Perl hence it *must* be treated the same way as
the
Perl native call_* functions, i.e. bracketed with PUTBACK/SPAGAIN.
Actually there is only one direct call to svn_swig_pl_callback_thunk in the
Swig rules. But it is used in a lot of other helper functions in
libsvn_swig_perl.
And hence these functions (and their callers, caller's callers etc)
*must* be bracketed, too, when used in Swig rules.

Failing to do so might work as long as the call back chain and the combined
stack usage stays small enough that the Perl runtime doesn't have to
reallocate its argument stack somewhere in between. IMHO James McCoy's
problem exceeded this threshold. My previous patch didn't really solve
the problem, it only reduced the callback chain (by getting rid of the
first return value from _wrap_svn_txdelta_apply which is a Perl object,
generated by a callback to Perl "SVN::MD5->new").

Cheers, Roderich

Re: Segfault in Perl bindings when commit touches a large number of files

Posted by James McCoy <ja...@debian.org>.
On Fri, May 22, 2015 at 07:34:21PM +0200, Roderich Schupp wrote:
> So back to your proposal, with the following patch svn-git has benn running
> without crash for 15 minutes now.

The patch is working for me as well.

Cheers,
-- 
James
GPG Key: 4096R/331BA3DB 2011-12-05 James McCoy <ja...@debian.org>

Re: Segfault in Perl bindings when commit touches a large number of files

Posted by Roderich Schupp <ro...@gmail.com>.
On Fri, May 22, 2015 at 6:25 PM, Roderich Schupp <ro...@gmail.com>
wrote:

>
> But I'm not entirely convinced that the bug is really in the construction
> of the magical md5sum.
> Maybe git-svn is to blame, perhaps a problem with the lifetime of the
> pools it uses...
>

I just modified the Swig generated svn_delta.c so that
_wrap_svn_txdelta_apply thinks
it is called with parameter pool = undef, forcing it to use a global pool
in the construction of
the magical md5sum. svn-git crashes with this, so its pool handling is not
to blame.

So back to your proposal, with the following patch svn-git has benn running
without crash for 15 minutes now.

It has the consequence that it breaks programs that care for the magical
md5sum, though.
But svn-git isn't affected (the way it calls SVN::TxDelta::apply makes sure
that the
magical md5sum is discarded).

Cheers, Roderich

Re: Segfault in Perl bindings when commit touches a large number of files

Posted by Roderich Schupp <ro...@gmail.com>.
On Fri, May 22, 2015 at 6:07 PM, Philip Martin <ph...@codematters.co.uk>
wrote:

> It would help if I built the correct tree.  No, that is not enough, the
> regression tests fail.
>


No surprise here:
(1) Your patch simply means: ignore the result_digest, but also produce no
return value for it.
But users of the Perl wrapper still expect it to return 3 items:
the magical md5sum, the handler and the baton. Now the wrapper returns only
the latter two.
(2) So you should should at least return undef as the first item,e.g

%typemap(argout) unsigned char *result_digest {
   %append_output(&PL_sv_undef);
}

(3) This will make t/5delta.t fail, as it tests for the magical sum which
is gone now.

But I'm not entirely convinced that the bug is really in the construction
of the magical md5sum.
Maybe git-svn is to blame, perhaps a problem with the lifetime of the pools
it uses...

Cheers, Roderich

Re: Segfault in Perl bindings when commit touches a large number of files

Posted by Philip Martin <ph...@codematters.co.uk>.
Philip Martin <ph...@wandisco.com> writes:

> So, following the approach of r876245, would something like this do the
> trick?

It would help if I built the correct tree.  No, that is not enough, the
regression tests fail.

-- 
Philip

Re: Segfault in Perl bindings when commit touches a large number of files

Posted by Philip Martin <ph...@wandisco.com>.
Roderich Schupp <ro...@gmail.com> writes:

> As a simple experiment, I modified the Swig generated svn_delta.c to
> effectively
> ignore result_digest (AFAICT that's the same approach taken by the Python
> bindings):
>
> - always pass NULL for result_digest in the actual call to svn_txdelta_apply
> - always return undef as the first value from the Perl wrapper (it's
> ignored by git-svn anyway)
>
> et voilĂ : git svn clone -r 28995:HEAD svn://anonsvn.kde.org/home/kde
> has been humming along for more than an hour now.

So, following the approach of r876245, would something like this do the
trick?

Index: subversion/bindings/swig/include/svn_types.swg
===================================================================
--- subversion/bindings/swig/include/svn_types.swg	(revision 1680818)
+++ subversion/bindings/swig/include/svn_types.swg	(working copy)
@@ -1111,13 +1111,12 @@ svn_ ## TYPE ## _swig_rb_closed(VALUE self)
 #endif
 
 #ifdef SWIGPERL
-%typemap(in, numinputs=0) unsigned char *result_digest {
-    $1 = (unsigned char *)apr_palloc(_global_pool, APR_MD5_DIGESTSIZE);
-}
-
-%typemap(argout) unsigned char *result_digest {
-    %append_output(svn_swig_pl_from_md5($1));
-}
+/*
+ * Skip the md5sum
+ * FIXME: Wrap the md5sum
+ */
+%typemap(in, numinputs=0) unsigned char *result_digest
+  "$1 = NULL;";
 #endif
 
 /* Category 3 */

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Segfault in Perl bindings when commit touches a large number of files

Posted by Roderich Schupp <ro...@gmail.com>.
On Wed, May 20, 2015 at 10:12 AM, Roderich Schupp <roderich.schupp@gmail.com
> wrote:

> Looking at your patch, the offending line may very well be the cause of
> the crash:


Nope, I'll take that back. When this line

    ST(argvi) = sv_newmortal();

is called argvi is still 0, so it is safe. This line is actually how Swig
wraps C functions
returning void. The code produced looks like this:

int argvi = 0;
...
/* handle input parameters, call the C function */
...
/* push return value onto Perl stack */
ST(argvi) = sv_newmortal();

/* push output parameters (if any) onto Perl stack, increment argvi
accordingly */
...

XSRETURN(argvi);

This is a little bit silly: if there are output parameters, the first one
will overwrite ST(0);
otherwise ST(0) will be discarded, since XSRETURN(0) is called at the end.
So, it's useless, but also harmless.

My prime suspects are parameters result_digest and (to a lesser extent)
error_info
of svn_txdelta_apply. These have totally weird behaviour even on the C side.
Both pass an address into svn_txdelta_apply that will be used long after
svn_txdelta_apply
has returned, apparently by the handler function (and baton) that are out
parameters.
The address error_info points to is probably only going to be read, but the
address result_digest points to will be written to. The latter gets special
treatment
in the Perl wrapper that I don't yet understand.

As a simple experiment, I modified the Swig generated svn_delta.c to
effectively
ignore result_digest (AFAICT that's the same approach taken by the Python
bindings):

- always pass NULL for result_digest in the actual call to svn_txdelta_apply
- always return undef as the first value from the Perl wrapper (it's
ignored by git-svn anyway)

et voilĂ : git svn clone -r 28995:HEAD svn://anonsvn.kde.org/home/kde
has been humming along for more than an hour now.

Cheers, Roderich

Re: Segfault in Perl bindings when commit touches a large number of files

Posted by Roderich Schupp <ro...@gmail.com>.
On Wed, May 20, 2015 at 6:19 AM, James McCoy <ja...@debian.org> wrote:

> I'm still experiencing the crash in 1.9.0-rc1 unless I apply the
> original change I suggested to the generated
> subversion/bindings/swig/perl/native/svn_delta.c.  I'm not sure how that
> specific part of the file is generated, so I'll just attach a diff of
> the generated file.
>

Looking at your patch, the offending line may very well be the cause of the
crash:
it stores  something onto the top (thats the argvi index) of the Perl
argument stack (which
is also used to hold a Perl sub's return values), but doesn't make sure
that
the stack is large enough. Typically you should see generated code like this


if (argvi >= items) EXTEND(sp,1);       /* grow the stack by one if we're
past
                                                                the input
arguments (which are guaranteed
                                                                to be
allocated) */
ST(argvi++) = ...                                   /* push a return value
on top of the stack */

But the problem is deeper: the handling of svn_txdelta_apply's parameter
result_digest is wrong. It's a weird kind of output parameter: it's the
address
of an array of bytes for an MD5 checksum. But svn_txdelta_apply doesn't
actually store something in there, this address is "remembered" by the
returned
handler and baton and is filled in by the final call to the handler.
This is NOT how output parameters are handled in the Perl bindings:
output parameters in C become actual return values in Perl (since Perl can
return multiple values from a sub).
The generated code treats result_digest as a regular output parameter
(which won't work - at least it won't give you the actual MD5), but
also generates the offending line. The easiest fix would be to ignore
the passed in result_digest and set it to NULL when actually calling
svn_txdelta_apply (meaning: we're not interested). I think that's
what the Python bindings do.
I'll work on this over the weekend if nobody beats me too it.



Cheers, Roderich