You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Stefan Küng <to...@gmail.com> on 2012/11/30 19:16:33 UTC

Error running context

Hi,

Using an svn build of trunk from about two hours ago, I'm trying to 
commit a binary file to the TSVN repository on Google code. Tried 
several times, and others have the same problem when they try to commit 
this file.

Here's how to reproduce:

$ svn co 
https://tortoisesvn.googlecode.com/svn/trunk/src/Resources/tools tools

get the file here:
https://skydrive.live.com/redir?resid=D000F60A347E5B37!11352
and replace the one in 'tools' with this one.

$ svn ci -m "" tools

Sending        tools\convert.exe
Transmitting file data 
.D:\Development\SVN\TortoiseSVN\ext\subversion\subversion
\svn\commit-cmd.c:183: (apr_err=20014)
D:\Development\SVN\TortoiseSVN\ext\subversion\subversion\libsvn_client\commit.c:
1964: (apr_err=20014)
D:\Development\SVN\TortoiseSVN\ext\subversion\subversion\libsvn_client\commit.c:
1088: (apr_err=20014)
svn: E020014: Commit failed (details follow):
D:\Development\SVN\TortoiseSVN\ext\subversion\subversion\libsvn_client\commit.c:
1891: (apr_err=20014)
D:\Development\SVN\TortoiseSVN\ext\subversion\subversion\libsvn_client\commit_ut
il.c:1846: (apr_err=20014)
D:\Development\SVN\TortoiseSVN\ext\subversion\subversion\libsvn_wc\adm_crawler.c
:1174: (apr_err=20014)
D:\Development\SVN\TortoiseSVN\ext\subversion\subversion\libsvn_ra_serf\commit.c
:2123: (apr_err=20014)
D:\Development\SVN\TortoiseSVN\ext\subversion\subversion\libsvn_ra_serf\util.c:8
19: (apr_err=20014)
D:\Development\SVN\TortoiseSVN\ext\subversion\subversion\libsvn_ra_serf\util.c:7
86: (apr_err=20014)
svn: E020014: Error running context: Internal error


Obviously you need commit access to the TSVN repo, so please tell me 
your GCode username and I'll get you that.

I've also tried to commit the file to my test repo on a Windows Apache 
setup, and there is succeeded when I added the file.
But then when I tried to commit a change to that file (copied over 
another version of the file), it failed with:
Commit failed (details follow):
Error running context: An existing connection was forcibly closed by the 
remote host.

Unfortunately there's nothing in the apache error log.


svn, version 1.8.0-dev (under development)
    compiled Nov 30 2012, 18:21:15 on x86-microsoft-windows

Copyright (C) 2012 The Apache Software Foundation.
This software consists of contributions made by many people;
see the NOTICE file for more information.
Subversion is open source software, see http://subversion.apache.org/

The following repository access (RA) modules are available:

* ra_svn : Module for accessing a repository using the svn network protocol.
   - with Cyrus SASL authentication
   - handles 'svn' scheme
* ra_local : Module for accessing a repository on local disk.
   - handles 'file' scheme
* ra_serf : Module for accessing a repository via WebDAV protocol using 
serf.
   - handles 'http' scheme
   - handles 'https' scheme


Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest interface to (Sub)version control
    /_/   \_\     http://tortoisesvn.net

Re: Error running context

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Sun, Dec 2, 2012 at 3:05 PM, Lieven Govaerts <sv...@mobsol.be> wrote:

> Attached the patch. ( I get paid per mail I send to this list in case
> no one noticed. ;) )
>
> On Sun, Dec 2, 2012 at 2:57 PM, Lieven Govaerts <sv...@mobsol.be> wrote:
> > Hi,
> >
> > On Fri, Nov 30, 2012 at 8:19 PM, Philip Martin
> > <ph...@wandisco.com> wrote:
> >> Stefan Küng <to...@gmail.com> writes:
> >>
> >>> Here's how to reproduce:
> >>>
> >>> $ svn co
> https://tortoisesvn.googlecode.com/svn/trunk/src/Resources/tools tools
> >>>
> >>> get the file here:
> >>> https://skydrive.live.com/redir?resid=D000F60A347E5B37!11352
> >>> and replace the one in 'tools' with this one.
> >>
> >> I can reproduce locally by importing tools into a local repository,
> >> checking out, replacing the file and attempting the commit.  That is
> >> using serf 1.1.x.  Using serf trunk the commit goes into a loop.
> >>
> >
> > I see the same problem in a local repository. With some extra logging
> > I see that one of the delta windows isn't handled correctly by the
> > server:
> >
> > This is svn trunk with serf:
> > write_handler window: {sview_offset = 102400, sview_len = 102400,
> > tview_len = 102400, num_ops = 55, src_ops = 27, ops->action =
> > svn_txdelta_new, new_data = 0x15cbc28}
> > write_handler window: {sview_offset = 204800, sview_len = 102400,
> > tview_len = 102400, num_ops = 143, src_ops = 71, ops->action =
> > svn_txdelta_new, new_data = 0x15c0028}
> > write_handler window: {sview_offset = 307200, sview_len = 102400,
> > tview_len = 102400, num_ops = 23, src_ops = 11, ops->action =
> > svn_txdelta_new, new_data = 0x15be428}
> > write_handler window: {sview_offset = 0, sview_len = 0, tview_len =
> > 102400, num_ops = 1, src_ops = 0, ops->action = svn_txdelta_new,
> > new_data = 0x17e8028}
> >
> > This is svn 1.7.7 with neon:
> > write_handler window: {sview_offset = 102400, sview_len = 102400,
> > tview_len = 102400, num_ops = 55, src_ops = 27, ops->action =
> > svn_txdelta_new, new_data = 0x15cbc28}
> > write_handler window: {sview_offset = 204800, sview_len = 102400,
> > tview_len = 102400, num_ops = 143, src_ops = 71, ops->action =
> > svn_txdelta_new, new_data = 0x15c0028}
> > write_handler window: {sview_offset = 307200, sview_len = 102400,
> > tview_len = 102400, num_ops = 23, src_ops = 11, ops->action =
> > svn_txdelta_new, new_data = 0x15be428}
> > write_handler window: {sview_offset = 0, sview_len = 0, tview_len =
> > 102400, num_ops = 1, src_ops = 0, ops->action = svn_txdelta_new,
> > new_data = 0x17e8028}
> > ...
> >
> > The core issue seems to be introduced in r1390435 as part of the
> > svndiff optimizations.
>

Thanks everyone for digging! The problem did not
show up in my tests as it only manifests when files
>100kB get updated and one of their 100k windows
does not match any source data.


> > Attached patch fixes the issue for me. I don't know how it impacts
> > other parts of the code, so review is appreciated. The patch still
> > contains logging so not meant to be applied directly!
>

Applied the relevant part as 1416194.

-- Stefan^2.

-- 
Certified & Supported Apache Subversion Downloads:
*

http://www.wandisco.com/subversion/download
*

Re: Error running context

Posted by Philip Martin <ph...@wandisco.com>.
Ben Reser <be...@reser.org> writes:

> On Sun, Dec 2, 2012 at 9:18 AM, Justin Erenkrantz <ju...@erenkrantz.com> wrote:
>> As for the looping, didn't we put in logic to stop retrying after a number
>> of request failures?  That's probably a reasonable thing to do...I
>> definitely think a very nice feature of serf is that it *will* retry (which
>> is helpful in flaky network situations), but it needs to stop at a certain
>> point.  =)  -- justin
>
> HTTP RFC suggests only 1 automatic retry on idempotent request sequences:
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.1.4

Retrying does make things really complicated with the current FSFS
implementation.  In this case the server is aborting which means that
the recent FSFS change to introduce rep_write_cleanup will be defeated:
any partial representation data written to the protorev file will remain
in the file.  It's not significant in this case because the transaction
will never become a revision, all the retries will fail in the same way.

However if the PUT failed because some other request in another thread
had crashed the process then the retry could succeed.  Then we would end
up with a revision file that contained some unreferenced junk data.

If the PUT failed due to a network issue the retry may succeed if it is
proccessed by some other server process.  If the retry is processed by
the same server process as the failed request then whether the retry
succeeds depends on whether the server has recognised the network
problem and run rep_write_cleanup.

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download

Re: Error running context

Posted by Ben Reser <be...@reser.org>.
On Sun, Dec 2, 2012 at 9:18 AM, Justin Erenkrantz <ju...@erenkrantz.com> wrote:
> As for the looping, didn't we put in logic to stop retrying after a number
> of request failures?  That's probably a reasonable thing to do...I
> definitely think a very nice feature of serf is that it *will* retry (which
> is helpful in flaky network situations), but it needs to stop at a certain
> point.  =)  -- justin

HTTP RFC suggests only 1 automatic retry on idempotent request sequences:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.1.4

Re: Error running context

Posted by Lieven Govaerts <sv...@mobsol.be>.
Hi,

On Sun, Dec 2, 2012 at 3:18 PM, Justin Erenkrantz <ju...@erenkrantz.com> wrote:
> On Sun, Dec 2, 2012 at 9:05 AM, Lieven Govaerts <sv...@mobsol.be> wrote:
>>
>> Attached the patch. ( I get paid per mail I send to this list in case
>> no one noticed. ;) )
>
>
> The patch looks right to me - the short-circuit to do the simple window can
> clearly be called for any window regardless of offset.  Perhaps we can add
> this testcase to the regression tests as well?
>
> As for the looping, didn't we put in logic to stop retrying after a number
> of request failures?  That's probably a reasonable thing to do...I
> definitely think a very nice feature of serf is that it *will* retry (which
> is helpful in flaky network situations), but it needs to stop at a certain
> point.  =)  -- justin

Serf is resending requests that were waiting on the pipeline to be
handled by the server, but when a read error is received it will not
resend the request.
Maybe ra_serf is doing that in some situations, but I don't think so either.

The issue here is that serf trunk was ignoring a read error on the
connection with an outstanding request, as a side effect if r1695.
That has been fixed in r1696.

Lieven

Re: Error running context

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Sun, Dec 2, 2012 at 9:05 AM, Lieven Govaerts <sv...@mobsol.be> wrote:

> Attached the patch. ( I get paid per mail I send to this list in case
> no one noticed. ;) )


The patch looks right to me - the short-circuit to do the simple window can
clearly be called for any window regardless of offset.  Perhaps we can add
this testcase to the regression tests as well?

As for the looping, didn't we put in logic to stop retrying after a number
of request failures?  That's probably a reasonable thing to do...I
definitely think a very nice feature of serf is that it *will* retry (which
is helpful in flaky network situations), but it needs to stop at a certain
point.  =)  -- justin

Re: Error running context

Posted by Lieven Govaerts <sv...@mobsol.be>.
Attached the patch. ( I get paid per mail I send to this list in case
no one noticed. ;) )

On Sun, Dec 2, 2012 at 2:57 PM, Lieven Govaerts <sv...@mobsol.be> wrote:
> Hi,
>
> On Fri, Nov 30, 2012 at 8:19 PM, Philip Martin
> <ph...@wandisco.com> wrote:
>> Stefan Küng <to...@gmail.com> writes:
>>
>>> Here's how to reproduce:
>>>
>>> $ svn co https://tortoisesvn.googlecode.com/svn/trunk/src/Resources/tools tools
>>>
>>> get the file here:
>>> https://skydrive.live.com/redir?resid=D000F60A347E5B37!11352
>>> and replace the one in 'tools' with this one.
>>
>> I can reproduce locally by importing tools into a local repository,
>> checking out, replacing the file and attempting the commit.  That is
>> using serf 1.1.x.  Using serf trunk the commit goes into a loop.
>>
>
> I see the same problem in a local repository. With some extra logging
> I see that one of the delta windows isn't handled correctly by the
> server:
>
> This is svn trunk with serf:
> write_handler window: {sview_offset = 102400, sview_len = 102400,
> tview_len = 102400, num_ops = 55, src_ops = 27, ops->action =
> svn_txdelta_new, new_data = 0x15cbc28}
> write_handler window: {sview_offset = 204800, sview_len = 102400,
> tview_len = 102400, num_ops = 143, src_ops = 71, ops->action =
> svn_txdelta_new, new_data = 0x15c0028}
> write_handler window: {sview_offset = 307200, sview_len = 102400,
> tview_len = 102400, num_ops = 23, src_ops = 11, ops->action =
> svn_txdelta_new, new_data = 0x15be428}
> write_handler window: {sview_offset = 0, sview_len = 0, tview_len =
> 102400, num_ops = 1, src_ops = 0, ops->action = svn_txdelta_new,
> new_data = 0x17e8028}
>
> This is svn 1.7.7 with neon:
> write_handler window: {sview_offset = 102400, sview_len = 102400,
> tview_len = 102400, num_ops = 55, src_ops = 27, ops->action =
> svn_txdelta_new, new_data = 0x15cbc28}
> write_handler window: {sview_offset = 204800, sview_len = 102400,
> tview_len = 102400, num_ops = 143, src_ops = 71, ops->action =
> svn_txdelta_new, new_data = 0x15c0028}
> write_handler window: {sview_offset = 307200, sview_len = 102400,
> tview_len = 102400, num_ops = 23, src_ops = 11, ops->action =
> svn_txdelta_new, new_data = 0x15be428}
> write_handler window: {sview_offset = 0, sview_len = 0, tview_len =
> 102400, num_ops = 1, src_ops = 0, ops->action = svn_txdelta_new,
> new_data = 0x17e8028}
> ...
>
> The core issue seems to be introduced in r1390435 as part of the
> svndiff optimizations.
>
> Attached patch fixes the issue for me. I don't know how it impacts
> other parts of the code, so review is appreciated. The patch still
> contains logging so not meant to be applied directly!
>
>> As far as I can tell the problem is the client causing mod_dav_svn to
>> SEGV (serf trunk keep retrying and causing multiple SEGVs).  The
>> mod_dav_svn stack trace isn't very useful, I'll need a httpd debug
>> build:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7fe2c42e7700 (LWP 31534)]
>> 0x00007fe2c98245cc in apr_brigade_cleanup () from /usr/lib/libaprutil-1.so.0
>> (gdb) bt
>> #0  0x00007fe2c98245cc in apr_brigade_cleanup ()
>>    from /usr/lib/libaprutil-1.so.0
>> #1  0x00007fe2c75258bf in ?? () from /usr/lib/apache2/modules/mod_dav.so
>> #2  0x00007fe2c7528960 in ?? () from /usr/lib/apache2/modules/mod_dav.so
>> #3  0x00007fe2c9ee51f0 in ap_run_handler ()
>> #4  0x00007fe2c9ee563b in ap_invoke_handler ()
>> #5  0x00007fe2c9ef5448 in ap_process_request ()
>> #6  0x00007fe2c9ef2308 in ?? ()
>> #7  0x00007fe2c9eebbb0 in ap_run_process_connection ()
>> #8  0x00007fe2c9efb55d in ?? ()
>> #9  0x00007fe2c960f597 in ?? () from /usr/lib/libapr-1.so.0
>> #10 0x00007fe2c93cbb50 in start_thread (arg=<optimized out>)
>>     at pthread_create.c:304
>> #11 0x00007fe2c9115a7d in clone ()
>>     at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>> #12 0x0000000000000000 in ?? ()
>>
>> I'd guess it's memory corruption in the server.
>
> Well, besides the client seemingly sending incorrect svndiff windows,
> the server should not crash. I got the following stack trace from
> httpd in the debugger:
>
> Out of memory - terminating application.
>
> Program received signal SIGABRT, Aborted.
> 0x00007fff88cd7ce2 in __pthread_kill ()
> (gdb) bt
> #0  0x00007fff88cd7ce2 in __pthread_kill ()
> #1  0x00007fff8381f7d2 in pthread_kill ()
> #2  0x00007fff83810a7a in abort ()
> #3  0x00000001011ef651 in abort_on_pool_failure (retcode=12) at pool.c:55
> #4  0x000000010030e290 in apr_palloc ()
> #5  0x00000001012067c7 in svn_stringbuf_create_ensure
> (blocksize=12804161111182623672, pool=0x100a72428) at string.c:329
> #6  0x0000000101206867 in svn_stringbuf_ncreate (bytes=0x1017dd035
> "??", size=12804161111182623667, pool=0x100a72428)
>     at string.c:346
> #7  0x0000000101199dbe in write_handler (baton=0x100a048b8,
> buffer=0x1009bfc48 "????ل$8\001", len=0x7fff5fbff2d8) at svndiff.c:886
> #8  0x00000001012011fa in svn_stream_write (stream=0x100a04900,
> data=0x1009bfc48 "????ل$8\001", len=0x7fff5fbff2d8) at stream.c:162
> #9  0x000000010102d30f in write_stream (stream=0x1009c8ba8,
> buf=0x1009bfc48, bufsize=2048) at repos.c:2892
> #10 0x00000001007969d4 in dav_handler ()
> #11 0x0000000100001cd6 in ap_invoke_handler ()
> #12 0x0000000100021433 in ap_process_request ()
> #13 0x000000010001eb50 in ap_process_http_connection ()
> #14 0x000000010000da28 in ap_process_connection ()
> #15 0x0000000100027219 in child_main ()
> #16 0x000000010002696a in make_child ()
> #17 0x000000010002600b in ap_mpm_run ()
> #18 0x0000000100007139 in main ()
> (gdb) frame 7
> #7  0x0000000101199dbe in write_handler (baton=0x100a048b8,
> buffer=0x1009bfc48 "????ل$8\001", len=0x7fff5fbff2d8) at svndiff.c:886
> 886           db->buffer =
> (gdb) p *len
> $1 = 2048
> (gdb) p remaining
> $2 = 12804161111182623667
> ..
> (gdb) p db->buffer->data
> $5 = 0xe4d8c0d9ec42b70f <Address 0xe4d8c0d9ec42b70f out of bounds>
>
> Looks like the db->buffer struct is overwritten with data, thereby
> invalidating the db->buffer->data pointer.
>
>
> A third issue is that serf is either segfaulting or retrying when the
> server aborts the connection due to this segfault. I'll look into this
> further.
>
> Lieven

Re: Error running context

Posted by Lieven Govaerts <sv...@mobsol.be>.
On Sun, Dec 2, 2012 at 2:57 PM, Lieven Govaerts <sv...@mobsol.be> wrote:
> Hi,
>
> On Fri, Nov 30, 2012 at 8:19 PM, Philip Martin
> <ph...@wandisco.com> wrote:
>> Stefan Küng <to...@gmail.com> writes:
>>
>>> Here's how to reproduce:
>>>
>>> $ svn co https://tortoisesvn.googlecode.com/svn/trunk/src/Resources/tools tools
>>>
>>> get the file here:
>>> https://skydrive.live.com/redir?resid=D000F60A347E5B37!11352
>>> and replace the one in 'tools' with this one.
>>
>> I can reproduce locally by importing tools into a local repository,
>> checking out, replacing the file and attempting the commit.  That is
>> using serf 1.1.x.  Using serf trunk the commit goes into a loop.
>>
>
> I see the same problem in a local repository. With some extra logging
> I see that one of the delta windows isn't handled correctly by the
> server:
>
> This is svn trunk with serf:
> write_handler window: {sview_offset = 102400, sview_len = 102400,
> tview_len = 102400, num_ops = 55, src_ops = 27, ops->action =
> svn_txdelta_new, new_data = 0x15cbc28}
> write_handler window: {sview_offset = 204800, sview_len = 102400,
> tview_len = 102400, num_ops = 143, src_ops = 71, ops->action =
> svn_txdelta_new, new_data = 0x15c0028}
> write_handler window: {sview_offset = 307200, sview_len = 102400,
> tview_len = 102400, num_ops = 23, src_ops = 11, ops->action =
> svn_txdelta_new, new_data = 0x15be428}
> write_handler window: {sview_offset = 0, sview_len = 0, tview_len =
> 102400, num_ops = 1, src_ops = 0, ops->action = svn_txdelta_new,
> new_data = 0x17e8028}
>
> This is svn 1.7.7 with neon:
> write_handler window: {sview_offset = 102400, sview_len = 102400,
> tview_len = 102400, num_ops = 55, src_ops = 27, ops->action =
> svn_txdelta_new, new_data = 0x15cbc28}
> write_handler window: {sview_offset = 204800, sview_len = 102400,
> tview_len = 102400, num_ops = 143, src_ops = 71, ops->action =
> svn_txdelta_new, new_data = 0x15c0028}
> write_handler window: {sview_offset = 307200, sview_len = 102400,
> tview_len = 102400, num_ops = 23, src_ops = 11, ops->action =
> svn_txdelta_new, new_data = 0x15be428}
> write_handler window: {sview_offset = 0, sview_len = 0, tview_len =
> 102400, num_ops = 1, src_ops = 0, ops->action = svn_txdelta_new,
> new_data = 0x17e8028}
> ...

Copy-paste error, this is the log content with neon:
write_handler window: {sview_offset = 0, sview_len = 102400, tview_len
= 102400, num_ops = 117, src_ops = 58, ops->action = svn_txdelta_new,
new_data = 0x3461028}
write_handler window: {sview_offset = 102400, sview_len = 102400,
tview_len = 102400, num_ops = 55, src_ops = 27, ops->action =
svn_txdelta_new, new_data = 0x34a9828}
write_handler window: {sview_offset = 204800, sview_len = 102400,
tview_len = 102400, num_ops = 143, src_ops = 71, ops->action =
svn_txdelta_new, new_data = 0x3461028}
write_handler window: {sview_offset = 307200, sview_len = 102400,
tview_len = 102400, num_ops = 23, src_ops = 11, ops->action =
svn_txdelta_new, new_data = 0x345aa28}
write_handler window: {sview_offset = 409600, sview_len = 102400,
tview_len = 102400, num_ops = 1, src_ops = 0, ops->action =
svn_txdelta_new, new_data = 0x3460028}
write_handler window: {sview_offset = 512000, sview_len = 102400,
tview_len = 102400, num_ops = 117, src_ops = 59, ops->action =
svn_txdelta_new, new_data = 0x345da28}
write_handler window: {sview_offset = 614400, sview_len = 102400,
tview_len = 102400, num_ops = 13, src_ops = 6, ops->action =
svn_txdelta_new, new_data = 0x36e1028}
...

>
> The core issue seems to be introduced in r1390435 as part of the
> svndiff optimizations.
>
> Attached patch fixes the issue for me. I don't know how it impacts
> other parts of the code, so review is appreciated. The patch still
> contains logging so not meant to be applied directly!
>
>> As far as I can tell the problem is the client causing mod_dav_svn to
>> SEGV (serf trunk keep retrying and causing multiple SEGVs).  The
>> mod_dav_svn stack trace isn't very useful, I'll need a httpd debug
>> build:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7fe2c42e7700 (LWP 31534)]
>> 0x00007fe2c98245cc in apr_brigade_cleanup () from /usr/lib/libaprutil-1.so.0
>> (gdb) bt
>> #0  0x00007fe2c98245cc in apr_brigade_cleanup ()
>>    from /usr/lib/libaprutil-1.so.0
>> #1  0x00007fe2c75258bf in ?? () from /usr/lib/apache2/modules/mod_dav.so
>> #2  0x00007fe2c7528960 in ?? () from /usr/lib/apache2/modules/mod_dav.so
>> #3  0x00007fe2c9ee51f0 in ap_run_handler ()
>> #4  0x00007fe2c9ee563b in ap_invoke_handler ()
>> #5  0x00007fe2c9ef5448 in ap_process_request ()
>> #6  0x00007fe2c9ef2308 in ?? ()
>> #7  0x00007fe2c9eebbb0 in ap_run_process_connection ()
>> #8  0x00007fe2c9efb55d in ?? ()
>> #9  0x00007fe2c960f597 in ?? () from /usr/lib/libapr-1.so.0
>> #10 0x00007fe2c93cbb50 in start_thread (arg=<optimized out>)
>>     at pthread_create.c:304
>> #11 0x00007fe2c9115a7d in clone ()
>>     at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>> #12 0x0000000000000000 in ?? ()
>>
>> I'd guess it's memory corruption in the server.
>
> Well, besides the client seemingly sending incorrect svndiff windows,
> the server should not crash. I got the following stack trace from
> httpd in the debugger:
>
> Out of memory - terminating application.
>
> Program received signal SIGABRT, Aborted.
> 0x00007fff88cd7ce2 in __pthread_kill ()
> (gdb) bt
> #0  0x00007fff88cd7ce2 in __pthread_kill ()
> #1  0x00007fff8381f7d2 in pthread_kill ()
> #2  0x00007fff83810a7a in abort ()
> #3  0x00000001011ef651 in abort_on_pool_failure (retcode=12) at pool.c:55
> #4  0x000000010030e290 in apr_palloc ()
> #5  0x00000001012067c7 in svn_stringbuf_create_ensure
> (blocksize=12804161111182623672, pool=0x100a72428) at string.c:329
> #6  0x0000000101206867 in svn_stringbuf_ncreate (bytes=0x1017dd035
> "??", size=12804161111182623667, pool=0x100a72428)
>     at string.c:346
> #7  0x0000000101199dbe in write_handler (baton=0x100a048b8,
> buffer=0x1009bfc48 "????ل$8\001", len=0x7fff5fbff2d8) at svndiff.c:886
> #8  0x00000001012011fa in svn_stream_write (stream=0x100a04900,
> data=0x1009bfc48 "????ل$8\001", len=0x7fff5fbff2d8) at stream.c:162
> #9  0x000000010102d30f in write_stream (stream=0x1009c8ba8,
> buf=0x1009bfc48, bufsize=2048) at repos.c:2892
> #10 0x00000001007969d4 in dav_handler ()
> #11 0x0000000100001cd6 in ap_invoke_handler ()
> #12 0x0000000100021433 in ap_process_request ()
> #13 0x000000010001eb50 in ap_process_http_connection ()
> #14 0x000000010000da28 in ap_process_connection ()
> #15 0x0000000100027219 in child_main ()
> #16 0x000000010002696a in make_child ()
> #17 0x000000010002600b in ap_mpm_run ()
> #18 0x0000000100007139 in main ()
> (gdb) frame 7
> #7  0x0000000101199dbe in write_handler (baton=0x100a048b8,
> buffer=0x1009bfc48 "????ل$8\001", len=0x7fff5fbff2d8) at svndiff.c:886
> 886           db->buffer =
> (gdb) p *len
> $1 = 2048
> (gdb) p remaining
> $2 = 12804161111182623667
> ..
> (gdb) p db->buffer->data
> $5 = 0xe4d8c0d9ec42b70f <Address 0xe4d8c0d9ec42b70f out of bounds>
>
> Looks like the db->buffer struct is overwritten with data, thereby
> invalidating the db->buffer->data pointer.
>
>
> A third issue is that serf is either segfaulting or retrying when the
> server aborts the connection due to this segfault. I'll look into this
> further.
>
> Lieven

Re: Error running context

Posted by Lieven Govaerts <sv...@mobsol.be>.
Hi,

On Fri, Nov 30, 2012 at 8:19 PM, Philip Martin
<ph...@wandisco.com> wrote:
> Stefan Küng <to...@gmail.com> writes:
>
>> Here's how to reproduce:
>>
>> $ svn co https://tortoisesvn.googlecode.com/svn/trunk/src/Resources/tools tools
>>
>> get the file here:
>> https://skydrive.live.com/redir?resid=D000F60A347E5B37!11352
>> and replace the one in 'tools' with this one.
>
> I can reproduce locally by importing tools into a local repository,
> checking out, replacing the file and attempting the commit.  That is
> using serf 1.1.x.  Using serf trunk the commit goes into a loop.
>

I see the same problem in a local repository. With some extra logging
I see that one of the delta windows isn't handled correctly by the
server:

This is svn trunk with serf:
write_handler window: {sview_offset = 102400, sview_len = 102400,
tview_len = 102400, num_ops = 55, src_ops = 27, ops->action =
svn_txdelta_new, new_data = 0x15cbc28}
write_handler window: {sview_offset = 204800, sview_len = 102400,
tview_len = 102400, num_ops = 143, src_ops = 71, ops->action =
svn_txdelta_new, new_data = 0x15c0028}
write_handler window: {sview_offset = 307200, sview_len = 102400,
tview_len = 102400, num_ops = 23, src_ops = 11, ops->action =
svn_txdelta_new, new_data = 0x15be428}
write_handler window: {sview_offset = 0, sview_len = 0, tview_len =
102400, num_ops = 1, src_ops = 0, ops->action = svn_txdelta_new,
new_data = 0x17e8028}

This is svn 1.7.7 with neon:
write_handler window: {sview_offset = 102400, sview_len = 102400,
tview_len = 102400, num_ops = 55, src_ops = 27, ops->action =
svn_txdelta_new, new_data = 0x15cbc28}
write_handler window: {sview_offset = 204800, sview_len = 102400,
tview_len = 102400, num_ops = 143, src_ops = 71, ops->action =
svn_txdelta_new, new_data = 0x15c0028}
write_handler window: {sview_offset = 307200, sview_len = 102400,
tview_len = 102400, num_ops = 23, src_ops = 11, ops->action =
svn_txdelta_new, new_data = 0x15be428}
write_handler window: {sview_offset = 0, sview_len = 0, tview_len =
102400, num_ops = 1, src_ops = 0, ops->action = svn_txdelta_new,
new_data = 0x17e8028}
...

The core issue seems to be introduced in r1390435 as part of the
svndiff optimizations.

Attached patch fixes the issue for me. I don't know how it impacts
other parts of the code, so review is appreciated. The patch still
contains logging so not meant to be applied directly!

> As far as I can tell the problem is the client causing mod_dav_svn to
> SEGV (serf trunk keep retrying and causing multiple SEGVs).  The
> mod_dav_svn stack trace isn't very useful, I'll need a httpd debug
> build:
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fe2c42e7700 (LWP 31534)]
> 0x00007fe2c98245cc in apr_brigade_cleanup () from /usr/lib/libaprutil-1.so.0
> (gdb) bt
> #0  0x00007fe2c98245cc in apr_brigade_cleanup ()
>    from /usr/lib/libaprutil-1.so.0
> #1  0x00007fe2c75258bf in ?? () from /usr/lib/apache2/modules/mod_dav.so
> #2  0x00007fe2c7528960 in ?? () from /usr/lib/apache2/modules/mod_dav.so
> #3  0x00007fe2c9ee51f0 in ap_run_handler ()
> #4  0x00007fe2c9ee563b in ap_invoke_handler ()
> #5  0x00007fe2c9ef5448 in ap_process_request ()
> #6  0x00007fe2c9ef2308 in ?? ()
> #7  0x00007fe2c9eebbb0 in ap_run_process_connection ()
> #8  0x00007fe2c9efb55d in ?? ()
> #9  0x00007fe2c960f597 in ?? () from /usr/lib/libapr-1.so.0
> #10 0x00007fe2c93cbb50 in start_thread (arg=<optimized out>)
>     at pthread_create.c:304
> #11 0x00007fe2c9115a7d in clone ()
>     at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
> #12 0x0000000000000000 in ?? ()
>
> I'd guess it's memory corruption in the server.

Well, besides the client seemingly sending incorrect svndiff windows,
the server should not crash. I got the following stack trace from
httpd in the debugger:

Out of memory - terminating application.

Program received signal SIGABRT, Aborted.
0x00007fff88cd7ce2 in __pthread_kill ()
(gdb) bt
#0  0x00007fff88cd7ce2 in __pthread_kill ()
#1  0x00007fff8381f7d2 in pthread_kill ()
#2  0x00007fff83810a7a in abort ()
#3  0x00000001011ef651 in abort_on_pool_failure (retcode=12) at pool.c:55
#4  0x000000010030e290 in apr_palloc ()
#5  0x00000001012067c7 in svn_stringbuf_create_ensure
(blocksize=12804161111182623672, pool=0x100a72428) at string.c:329
#6  0x0000000101206867 in svn_stringbuf_ncreate (bytes=0x1017dd035
"??", size=12804161111182623667, pool=0x100a72428)
    at string.c:346
#7  0x0000000101199dbe in write_handler (baton=0x100a048b8,
buffer=0x1009bfc48 "????ل$8\001", len=0x7fff5fbff2d8) at svndiff.c:886
#8  0x00000001012011fa in svn_stream_write (stream=0x100a04900,
data=0x1009bfc48 "????ل$8\001", len=0x7fff5fbff2d8) at stream.c:162
#9  0x000000010102d30f in write_stream (stream=0x1009c8ba8,
buf=0x1009bfc48, bufsize=2048) at repos.c:2892
#10 0x00000001007969d4 in dav_handler ()
#11 0x0000000100001cd6 in ap_invoke_handler ()
#12 0x0000000100021433 in ap_process_request ()
#13 0x000000010001eb50 in ap_process_http_connection ()
#14 0x000000010000da28 in ap_process_connection ()
#15 0x0000000100027219 in child_main ()
#16 0x000000010002696a in make_child ()
#17 0x000000010002600b in ap_mpm_run ()
#18 0x0000000100007139 in main ()
(gdb) frame 7
#7  0x0000000101199dbe in write_handler (baton=0x100a048b8,
buffer=0x1009bfc48 "????ل$8\001", len=0x7fff5fbff2d8) at svndiff.c:886
886	      db->buffer =
(gdb) p *len
$1 = 2048
(gdb) p remaining
$2 = 12804161111182623667
..
(gdb) p db->buffer->data
$5 = 0xe4d8c0d9ec42b70f <Address 0xe4d8c0d9ec42b70f out of bounds>

Looks like the db->buffer struct is overwritten with data, thereby
invalidating the db->buffer->data pointer.


A third issue is that serf is either segfaulting or retrying when the
server aborts the connection due to this segfault. I'll look into this
further.

Lieven

Re: Error running context

Posted by Philip Martin <ph...@codematters.co.uk>.
Philip Martin <ph...@wandisco.com> writes:

> Philip Martin <ph...@wandisco.com> writes:
>
>> As far as I can tell the problem is the client causing mod_dav_svn to
>> SEGV (serf trunk keep retrying and causing multiple SEGVs).  The
>> mod_dav_svn stack trace isn't very useful, I'll need a httpd debug
>> build:
>
> Debug build with pool debugging:

valgrind reports:

==8362== Invalid write of size 8
==8362==    at 0x4C2B685: memmove (mc_replace_strmem.c:981)
==8362==    by 0x8C05D9F: apply_window (text_delta.c:810)
==8362==    by 0x90D0741: window_consumer (tree.c:2661)
==8362==    by 0x8C03CFA: write_handler (svndiff.c:879)
==8362==    by 0x8E63D21: svn_stream_write (stream.c:162)
==8362==    by 0x858CE21: write_stream (repos.c:2776)
==8362==    by 0x7B34717: dav_method_put (mod_dav.c:1031)
==8362==    by 0x7B3767F: dav_handler (mod_dav.c:4645)
==8362==    by 0x43DCDF: ap_run_handler (config.c:158)
==8362==    by 0x43E10A: ap_invoke_handler (config.c:376)
==8362==    by 0x44A877: ap_process_request (http_request.c:282)
==8362==    by 0x44799F: ap_process_http_connection (http_core.c:190)
==8362==  Address 0xf8e74b8 is 13,704 bytes inside a block of size 40,960 free'd
==8362==    at 0x4C27D4E: free (vg_replace_malloc.c:427)
==8362==    by 0x589ADD5: pool_clear_debug (apr_pools.c:1576)
==8362==    by 0x589AB7D: pool_destroy_debug (apr_pools.c:1638)
==8362==    by 0x8C03DCE: write_handler (svndiff.c:898)
==8362==    by 0x8E63D21: svn_stream_write (stream.c:162)
==8362==    by 0x858CE21: write_stream (repos.c:2776)
==8362==    by 0x7B34717: dav_method_put (mod_dav.c:1031)
==8362==    by 0x7B3767F: dav_handler (mod_dav.c:4645)
==8362==    by 0x43DCDF: ap_run_handler (config.c:158)
==8362==    by 0x43E10A: ap_invoke_handler (config.c:376)
==8362==    by 0x44A877: ap_process_request (http_request.c:282)
==8362==    by 0x44799F: ap_process_http_connection (http_core.c:190)

-- 
Philip

Re: Error running context

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> As far as I can tell the problem is the client causing mod_dav_svn to
> SEGV (serf trunk keep retrying and causing multiple SEGVs).  The
> mod_dav_svn stack trace isn't very useful, I'll need a httpd debug
> build:

Debug build with pool debugging:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f8f7bc7b700 (LWP 8258)]
__memmove_ssse3 () at ../sysdeps/x86_64/multiarch/memcpy-ssse3.S:296
296	../sysdeps/x86_64/multiarch/memcpy-ssse3.S: No such file or directory.
(gdb) bt
#0  __memmove_ssse3 () at ../sysdeps/x86_64/multiarch/memcpy-ssse3.S:296
#1  0x00007f8f7da75da0 in apply_window (window=0x7f8f7bc7a930, baton=0x20e7c60)
    at ../src/subversion/libsvn_delta/text_delta.c:810
#2  0x00007f8f7d5c3742 in window_consumer (window=0x7f8f7bc7a930, 
    baton=0x20c54b0) at ../src/subversion/libsvn_fs_fs/tree.c:2661
#3  0x00007f8f7da73cfb in write_handler (baton=0x20e95b0, 
    buffer=0x20d03d8  <incomplete sequence \331>, len=0x7f8f7bc7aa28)
    at ../src/subversion/libsvn_delta/svndiff.c:879
#4  0x00007f8f7d82cd22 in svn_stream_write (stream=0x20e9af0, 
    data=0x20d03d8  <incomplete sequence \331>, len=0x7f8f7bc7aa28)
    at ../src/subversion/libsvn_subr/stream.c:162
#5  0x00007f8f7e0eee22 in write_stream (stream=0x20c9590, buf=0x20d03d8, 
    bufsize=1856) at ../src/subversion/mod_dav_svn/repos.c:2776
#6  0x00007f8f7eb2f718 in dav_method_put (r=r@entry=0x207c680)
    at mod_dav.c:1031
#7  0x00007f8f7eb32680 in dav_handler (r=0x207c680) at mod_dav.c:4645
#8  dav_handler (r=0x207c680) at mod_dav.c:4567
#9  0x000000000043dce0 in ap_run_handler (r=0x207c680) at config.c:158
#10 0x000000000043e10b in ap_invoke_handler (r=r@entry=0x207c680)
    at config.c:376
#11 0x000000000044a878 in ap_process_request (r=r@entry=0x207c680)
    at http_request.c:282
#12 0x00000000004479a0 in ap_process_http_connection (c=0x207be80)
    at http_core.c:190
#13 0x0000000000443fc0 in ap_run_process_connection (c=0x207be80)
    at connection.c:43
#14 0x0000000000444358 in ap_process_connection (c=c@entry=0x207be80, 
    csd=csd@entry=0x207bc20) at connection.c:190
#15 0x0000000000450412 in process_socket (bucket_alloc=<optimized out>, 
    my_thread_num=0, my_child_num=0, sock=0x207bc20, p=<optimized out>)
    at worker.c:545
#16 worker_thread (thd=0x207ade0, dummy=<optimized out>) at worker.c:895
#17 0x00007f8f80164b50 in start_thread (arg=<optimized out>)
    at pthread_create.c:304
#18 0x00007f8f7fcaaa7d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#19 0x0000000000000000 in ?? ()
(gdb) 


-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download

Re: Error running context

Posted by Philip Martin <ph...@wandisco.com>.
Stefan Küng <to...@gmail.com> writes:

> Here's how to reproduce:
>
> $ svn co https://tortoisesvn.googlecode.com/svn/trunk/src/Resources/tools tools
>
> get the file here:
> https://skydrive.live.com/redir?resid=D000F60A347E5B37!11352
> and replace the one in 'tools' with this one.

I can reproduce locally by importing tools into a local repository,
checking out, replacing the file and attempting the commit.  That is
using serf 1.1.x.  Using serf trunk the commit goes into a loop.

As far as I can tell the problem is the client causing mod_dav_svn to
SEGV (serf trunk keep retrying and causing multiple SEGVs).  The
mod_dav_svn stack trace isn't very useful, I'll need a httpd debug
build:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fe2c42e7700 (LWP 31534)]
0x00007fe2c98245cc in apr_brigade_cleanup () from /usr/lib/libaprutil-1.so.0
(gdb) bt
#0  0x00007fe2c98245cc in apr_brigade_cleanup ()
   from /usr/lib/libaprutil-1.so.0
#1  0x00007fe2c75258bf in ?? () from /usr/lib/apache2/modules/mod_dav.so
#2  0x00007fe2c7528960 in ?? () from /usr/lib/apache2/modules/mod_dav.so
#3  0x00007fe2c9ee51f0 in ap_run_handler ()
#4  0x00007fe2c9ee563b in ap_invoke_handler ()
#5  0x00007fe2c9ef5448 in ap_process_request ()
#6  0x00007fe2c9ef2308 in ?? ()
#7  0x00007fe2c9eebbb0 in ap_run_process_connection ()
#8  0x00007fe2c9efb55d in ?? ()
#9  0x00007fe2c960f597 in ?? () from /usr/lib/libapr-1.so.0
#10 0x00007fe2c93cbb50 in start_thread (arg=<optimized out>)
    at pthread_create.c:304
#11 0x00007fe2c9115a7d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#12 0x0000000000000000 in ?? ()

I'd guess it's memory corruption in the server.

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download