You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Kyle McKay <ma...@gmail.com> on 2009/03/28 23:04:18 UTC
[PATCH] fix for ssh zombies introduced with r35533
[[[
Fix ssh zombie problem introduced with revision 35533
* subversion/libsvn_ra_svn/client.c
(make_tunnel): Fully detach tunnel process to avoid having it
receive signals while restoring the original
apr_pool_note_subprocess to avoid creating zombies.
]]]
As detailed in this thread:
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1445592
regarding the change introduced into client.c by revision 35533:
http://svn.collab.net/viewvc/svn?view=revision&revision=35533
On Mar 27, 2009, at 09:51, Hyrum K. Wright wrote:
> The case which drove r35533 was a user who uses ssh connection
> pooling for svn connections.
Change r35533 removed the call to to apr_pool_note_subprocess meaning
that Subversion never reaps any of its children. This may be okay for
a short-lived process, but is not okay for a long-lived process such
as a GUI tool that has linked with the Subversion library.
If the GUI tool runs long enough, it can create so many un-reaped
zombie processes that system resources are exhausted and it becomes
impossible to spawn any new processes.
The attached patch restores the original call to
apr_pool_note_subprocess thus guaranteeing no zombies are ever
created. It also arranges for the tunnel process to become detached
so that it can "die in piece, in its own time, on its own terms" as
the log comment for revision 35533 mentions, but without causing
zombie processes to be created.
This change retains full compatibility with ssh connection pooling
while eliminating the zombie problem. See issue #2580 for the
original impetus for the change made in revision 35533.
http://subversion.tigris.org/issues/show_bug.cgi?id=2580
Kyle
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1462473
Re: [PATCH] fix for ssh zombies introduced with r35533
Posted by Gavin Baumanis <ga...@thespidernet.com>.
Ping. This patch has received no comments.
(Discounting Hyrum's - "Your patch has not fallen on deaf ears post").
Gavin.
On 29/03/2009, at 10:04 AM, Kyle McKay wrote:
> [[[
> Fix ssh zombie problem introduced with revision 35533
>
> * subversion/libsvn_ra_svn/client.c
> (make_tunnel): Fully detach tunnel process to avoid having it
> receive signals while restoring the original
> apr_pool_note_subprocess to avoid creating zombies.
> ]]]
>
> As detailed in this thread:
>
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1445592
>
> regarding the change introduced into client.c by revision 35533:
>
> http://svn.collab.net/viewvc/svn?view=revision&revision=35533
>
> On Mar 27, 2009, at 09:51, Hyrum K. Wright wrote:
>> The case which drove r35533 was a user who uses ssh connection
>> pooling for svn connections.
>
> Change r35533 removed the call to to apr_pool_note_subprocess meaning
> that Subversion never reaps any of its children. This may be okay for
> a short-lived process, but is not okay for a long-lived process such
> as a GUI tool that has linked with the Subversion library.
>
> If the GUI tool runs long enough, it can create so many un-reaped
> zombie processes that system resources are exhausted and it becomes
> impossible to spawn any new processes.
>
> The attached patch restores the original call to
> apr_pool_note_subprocess thus guaranteeing no zombies are ever
> created. It also arranges for the tunnel process to become detached
> so that it can "die in piece, in its own time, on its own terms" as
> the log comment for revision 35533 mentions, but without causing
> zombie processes to be created.
>
> This change retains full compatibility with ssh connection pooling
> while eliminating the zombie problem. See issue #2580 for the
> original impetus for the change made in revision 35533.
>
> http://subversion.tigris.org/issues/show_bug.cgi?id=2580
>
> Kyle
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1462473
> <client.c-patch>
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1688806
Re: [PATCH] fix for ssh zombies introduced with r35533
Posted by James Y Knight <fo...@fuhm.net>.
On Apr 21, 2009, at 7:15 AM, Greg Hudson wrote:
> It already does this; try running svnserve -t and hitting ctrl-D to
> terminate the input stream, and you'll see that it exits.
>
> (I don't know why that's not enough to solve the problem.)
I expect it's because svnserve doesn't monitor the input stream for
EOF while it's blocking trying to write to the output stream.
James
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1848217
To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].
Re: [PATCH] fix for ssh zombies introduced with r35533
Posted by Kyle McKay <ma...@gmail.com>.
On Apr 21, 2009, at 04:15, Greg Hudson wrote:
> On Tue, 2009-04-21 at 00:34 -0400, James Y Knight wrote:
>> Perhaps the svnserve
>> process to notice that its *input* pipe had been closed, and abort
>> the
>> connection.
>
> It already does this; try running svnserve -t and hitting ctrl-D to
> terminate the input stream, and you'll see that it exits.
>
> (I don't know why that's not enough to solve the problem.)
It's not enough because Apple's sshd codebase is broken. I have filed
bug report rdar://6810722 with Apple about this. Here are some
details from an email exchange I've been having:
On Apr 20, 2009, at 23:53, Damien Miller wrote:
> On Mon, 20 Apr 2009, Kyle McKay wrote:
>
>> Thanks for your response.
>>
>> On Apr 20, 2009, at 23:10, Damien Miller wrote:
>>>
>>> This doesn't look like bug#85 based on what you sent below. It looks
>>> more like a whatever that you were running failing to notice its
>>> stdout
>>> closing.
>>
>> Actually after looking at this some more, I think Apple's sshd is
>> just
>> missing the Bug 85 patch and I think the debug messages back this up.
>
> No, Apple's sshd clearly does have the bug #85 patch:
>
>> debug2: channel 0: rcvd eow
>
> Would not be printed otherwise. Perhaps they have forcibly disabled
> USE_PIPES in session.c? It is required for the patch to correctly
> function.
>
>> And in this case sshd included the Bug 85 fix and so was using
>> pipes to
>> communicate with the child and the child therefore got a SIGPIPE on
>> the input
>> open -> closed transition and exited.
>
> No, in both cases the signalling between sshd and the child process is
> identical from the logs. There may be differences depending on whether
> Apple has modified their sshd to avoid using pipes (thwarting the
> half-close fix in the process), but you will need to post a full debug
> log from the server to tell.
>
> -d
Yup, you're right. Apple has disabled the USE_PIPES define in their
source code.
Here's the original file:
http://www.opensource.apple.com/darwinsource/10.5.6/OpenSSH-95.1.5/openssh/session.c.orig
And the file they're using:
http://www.opensource.apple.com/darwinsource/10.5.6/OpenSSH-95.1.5/openssh/session.c
Here's the diff:
--- session.c.orig
+++ session.c
@@ -424,7 +424,9 @@
}
}
+#ifndef __APPLE__
#define USE_PIPES
+#endif
/*
* This is called to fork and execute a command when we have no tty.
This
* will call do_child from the child, and server_loop from the parent
after
@@ -2082,8 +2084,10 @@
n_bytes = packet_remaining();
tty_parse_modes(s->ttyfd, &n_bytes);
+#ifndef __APPLE_PRIVPTY__
if (!use_privsep)
pty_setowner(s->pw, s->tty);
+#endif
/* Set window size from the packet. */
pty_change_window_size(s->ptyfd, s->row, s->col, s->xpixel, s->ypixel);
@@ -2321,9 +2325,11 @@
if (s->pid != 0)
record_logout(s->pid, s->tty, s->pw->pw_name);
+#ifndef __APPLE_PRIVPTY__
/* Release the pseudo-tty. */
if (getuid() == 0)
pty_release(s->tty);
+#endif
/*
* Close the server side of the socket pairs. We must do this after
Thanks for your help.
Kyle
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1842264
To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].
Re: [PATCH] fix for ssh zombies introduced with r35533
Posted by Greg Hudson <gh...@mit.edu>.
On Tue, 2009-04-21 at 00:34 -0400, James Y Knight wrote:
> Perhaps the svnserve
> process to notice that its *input* pipe had been closed, and abort the
> connection.
It already does this; try running svnserve -t and hitting ctrl-D to
terminate the input stream, and you'll see that it exits.
(I don't know why that's not enough to solve the problem.)
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1841824
Re: [PATCH] fix for ssh zombies introduced with r35533
Posted by James Y Knight <fo...@fuhm.net>.
On Apr 20, 2009, at 10:46 PM, Kyle McKay wrote:
> Thanks for your details. I can now reproduce the problem you describe
> provided I'm using a version of OpenSSH sshd that does not contain the
> fix for OpenSSH Bug 85 and the ssh client connects to it using
> Protocol 2:
>
> https://bugzilla.mindrot.org/show_bug.cgi?id=85
>
> The bug is actually endemic to the sshd code, not the ssh code.
>
> To test whether or not any particular sshd suffers from the bug, do
> this:
>
> ssh -2 <ssh_host_to_test> yes | true
Wow, NICE job tracking that down!
This is rather insidious, because it's actually an SSHv2 protocol
specification deficiency, not actually an implementation bug. Thus, it
occurs if *either* the client *or* the server isn't patched, e.g., a
OpenSSH_3.9p1 connecting to a OpenSSH_5.1p1 machine still sees this
hang. Furthermore, the fix to the hang is an OpenSSH-specific protocol
extension, which OpenSSH will only send to other OpenSSH
implementations.
See "3. connection: Channel write close extension" in:
http://www.openbsd.org/cgi-bin/cvsweb/src/usr.bin/ssh/PROTOCOL?rev=1.12;content-type=text%2Fplain
> Whether or not to fix the zombie problem (it should definitely be
> fixed ASAP) is a different issue from whether or not to revert support
> for issue 2580. Deciding to revert issue 2580 support also happens to
> fix the zombie problem.
Indeed.
Despite all the above talk about ssh, I think ultimately this truly is
a subversion bug. SVN is responsible for both ends of the connection.
When the subversion client gets a SIGPIPE on its output, it can do
something to make sure it cleans up properly. One way to do this is to
kill the ssh client process, like it did before.
There are other alternatives, although they're probably enough work
that they're not doable in the 1.6 patch series. Perhaps the svnserve
process to notice that its *input* pipe had been closed, and abort the
connection. This, ssh *does* support: "ssh hostname cat < /dev/null"
never hangs on any version of ssh I know of, demonstrating that if
svnserve similarly noticed a closed input, it could properly shut
down. Or perhaps the client could upon receipt of a signal, not exit
immediately, but instead send a "cancel" command to the server,
read&discard further data from ssh, and wait for the ssh process to
exit first. Or perhaps there's yet another solution.
Of course, Subversion *could* simply declare that it requires that you
use OpenSSH >= 5.1, but IMO that isn't very sensible. Not only is this
version rather new and many people aren't using it yet, but it really
is an *OpenSSH* dependency. Other clients or servers won't do. So, I
do think the change made for issue 2580 should be reverted...at least
for now.
At my workplace, nearly everything is using ssh versions < 5.1, and it
would cause serious hurt for the svn server if people used unpatched
subversion 1.6 clients which don't shutdown their ssh tunnels. I
already reverted r35533 in the subversion build I created, and told
everyone else to do the same if they make custom builds. But I imagine
the same issue will occur for many other organizations.
James
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1836728
Re: [PATCH] fix for ssh zombies introduced with r35533
Posted by James Y Knight <fo...@fuhm.net>.
On Apr 20, 2009, at 10:46 PM, Kyle McKay wrote:
> Thanks for your details. I can now reproduce the problem you describe
> provided I'm using a version of OpenSSH sshd that does not contain the
> fix for OpenSSH Bug 85 and the ssh client connects to it using
> Protocol 2:
>
> https://bugzilla.mindrot.org/show_bug.cgi?id=85
>
> The bug is actually endemic to the sshd code, not the ssh code.
>
> To test whether or not any particular sshd suffers from the bug, do
> this:
>
> ssh -2 <ssh_host_to_test> yes | true
Wow, NICE job tracking that down!
This is rather insidious, because it's actually an SSHv2 protocol
specification deficiency, not actually an implementation bug. Thus, it
occurs if *either* the client *or* the server isn't patched, e.g., a
OpenSSH_3.9p1 connecting to a OpenSSH_5.1p1 machine still sees this
hang. Furthermore, the fix to the hang is an OpenSSH-specific protocol
extension, which OpenSSH will only send to other OpenSSH
implementations.
See "3. connection: Channel write close extension" in:
http://www.openbsd.org/cgi-bin/cvsweb/src/usr.bin/ssh/PROTOCOL?rev=1.12;content-type=text%2Fplain
> Whether or not to fix the zombie problem (it should definitely be
> fixed ASAP) is a different issue from whether or not to revert support
> for issue 2580. Deciding to revert issue 2580 support also happens to
> fix the zombie problem.
Indeed.
Despite all the above talk about ssh, I think ultimately this truly is
a subversion bug. SVN is responsible for both ends of the connection.
When the subversion client gets a SIGPIPE on its output, it can do
something to make sure it cleans up properly. One way to do this is to
kill the ssh client process, like it did before.
There are other alternatives, although they're probably enough work
that they're not doable in the 1.6 patch series. Perhaps the svnserve
process to notice that its *input* pipe had been closed, and abort the
connection. This, ssh *does* support: "ssh hostname cat < /dev/null"
never hangs on any version of ssh I know of, demonstrating that if
svnserve similarly noticed a closed input, it could properly shut
down. Or perhaps the client could upon receipt of a signal, not exit
immediately, but instead send a "cancel" command to the server,
read&discard further data from ssh, and wait for the ssh process to
exit first. Or perhaps there's yet another solution.
Of course, Subversion *could* simply declare that it requires that you
use OpenSSH >= 5.1, but IMO that isn't very sensible. Not only is this
version rather new and many people aren't using it yet, but it really
is an *OpenSSH* dependency. Other clients or servers won't do. So, I
do think the change made for issue 2580 should be reverted...at least
for now.
At my workplace, nearly everything is using ssh versions < 5.1, and it
would cause serious hurt for the svn server if people used unpatched
subversion 1.6 clients which don't shutdown their ssh tunnels. I
already reverted r35533 in the subversion build I created, and told
everyone else to do the same if they make custom builds. But I imagine
the same issue will occur for many other organizations.
James
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1836729
To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].
Re: [PATCH] fix for ssh zombies introduced with r35533
Posted by Kyle McKay <ma...@gmail.com>.
James,
Thanks for your details. I can now reproduce the problem you describe
provided I'm using a version of OpenSSH sshd that does not contain the
fix for OpenSSH Bug 85 and the ssh client connects to it using
Protocol 2:
https://bugzilla.mindrot.org/show_bug.cgi?id=85
The bug is actually endemic to the sshd code, not the ssh code.
To test whether or not any particular sshd suffers from the bug, do
this:
ssh -2 <ssh_host_to_test> yes | true
If after responding to any prompts/entering any passwords, the
connection hangs, the sshd server at <ssh_host_to_test> suffers from
the bug. Note that the client must connect with Protocol 2 (which is
generally the default nowadays). If the client connects with Protocol
1 (which is often disabled nowadays), the bug does not happen.
Some results:
Up-to-date Ubuntu 8.10: Bug 85 fixed
Up-to-date Mac OS X 10.5.6: Bug 85 PRESENT
Up-to-date MacPorts sshd: Bug 85 fixed
The other reason I wasn't able to duplicate the bug initially is that
apparently the following:
svn log svn+ssh://<some_host>/<some_repos> | head
has a race condition. If the results coming back from the remote
svnserve -t instance manage to be sent completely and buffered in
memory before head closes the pipe, you don't see a problem.
For example this often demonstrates the problem:
svn log svn+ssh://<some_host>/<some_repos> | true
whereas this often does not (provided you wait patiently for sleep to
exit without interrupting it):
svn log svn+ssh://<some_host>/<some_repos> | sleep 15
So to sum it up, the problem is an already-fixed OpenSSH bug distinct
from Subversion issue 2580. Fixing Subversion issue 2580 merely
exposes the OpenSSH bug (provided you have an old sshd that doesn't
contain the fix).
The patch I added to 2580 is not intended to fix the OpenSSH problem,
it is only intended to eliminate the zombie problem introduced by
change 35533.
Whether or not to fix the zombie problem (it should definitely be
fixed ASAP) is a different issue from whether or not to revert support
for issue 2580. Deciding to revert issue 2580 support also happens to
fix the zombie problem.
Kyle
On Apr 19, 2009, at 20:52, James Y Knight wrote:
> On Apr 19, 2009, at 10:45 PM, Kyle McKay wrote:
>
>> I cannot duplicate the problem described by James using the released
>> version of Subversion 1.6.1 with the patch I attached to issue 2580
>> applied:
>>
>> http://subversion.tigris.org/issues/show_bug.cgi?id=2580
>
> I see the issue both with and without your patch attached to issue
> 2580 applied, which is to be expected, as your patch does not
> attempt to fix anything but the issue of "zombie" processes. These
> aren't zombie processes, but real ssh processes sitting around doing
> nothing long after the svn process has disappeared.
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1835189
To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].
Re: [PATCH] fix for ssh zombies introduced with r35533
Posted by Kyle McKay <ma...@gmail.com>.
James,
Thanks for your details. I can now reproduce the problem you describe
provided I'm using a version of OpenSSH sshd that does not contain the
fix for OpenSSH Bug 85 and the ssh client connects to it using
Protocol 2:
https://bugzilla.mindrot.org/show_bug.cgi?id=85
The bug is actually endemic to the sshd code, not the ssh code.
To test whether or not any particular sshd suffers from the bug, do
this:
ssh -2 <ssh_host_to_test> yes | true
If after responding to any prompts/entering any passwords, the
connection hangs, the sshd server at <ssh_host_to_test> suffers from
the bug. Note that the client must connect with Protocol 2 (which is
generally the default nowadays). If the client connects with Protocol
1 (which is often disabled nowadays), the bug does not happen.
Some results:
Up-to-date Ubuntu 8.10: Bug 85 fixed
Up-to-date Mac OS X 10.5.6: Bug 85 PRESENT
Up-to-date MacPorts sshd: Bug 85 fixed
The other reason I wasn't able to duplicate the bug initially is that
apparently the following:
svn log svn+ssh://<some_host>/<some_repos> | head
has a race condition. If the results coming back from the remote
svnserve -t instance manage to be sent completely and buffered in
memory before head closes the pipe, you don't see a problem.
For example this often demonstrates the problem:
svn log svn+ssh://<some_host>/<some_repos> | true
whereas this often does not (provided you wait patiently for sleep to
exit without interrupting it):
svn log svn+ssh://<some_host>/<some_repos> | sleep 15
So to sum it up, the problem is an already-fixed OpenSSH bug distinct
from Subversion issue 2580. Fixing Subversion issue 2580 merely
exposes the OpenSSH bug (provided you have an old sshd that doesn't
contain the fix).
The patch I added to 2580 is not intended to fix the OpenSSH problem,
it is only intended to eliminate the zombie problem introduced by
change 35533.
Whether or not to fix the zombie problem (it should definitely be
fixed ASAP) is a different issue from whether or not to revert support
for issue 2580. Deciding to revert issue 2580 support also happens to
fix the zombie problem.
Kyle
On Apr 19, 2009, at 20:52, James Y Knight wrote:
> On Apr 19, 2009, at 10:45 PM, Kyle McKay wrote:
>
>> I cannot duplicate the problem described by James using the released
>> version of Subversion 1.6.1 with the patch I attached to issue 2580
>> applied:
>>
>> http://subversion.tigris.org/issues/show_bug.cgi?id=2580
>
> I see the issue both with and without your patch attached to issue
> 2580 applied, which is to be expected, as your patch does not
> attempt to fix anything but the issue of "zombie" processes. These
> aren't zombie processes, but real ssh processes sitting around doing
> nothing long after the svn process has disappeared.
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1835186
Re: [PATCH] fix for ssh zombies introduced with r35533
Posted by James Y Knight <fo...@fuhm.net>.
On Apr 19, 2009, at 10:45 PM, Kyle McKay wrote:
> I cannot duplicate the problem described by James using the released
> version of Subversion 1.6.1 with the patch I attached to issue 2580
> applied:
>
> http://subversion.tigris.org/issues/show_bug.cgi?id=2580
I see the issue both with and without your patch attached to issue
2580 applied, which is to be expected, as your patch does not attempt
to fix anything but the issue of "zombie" processes. These aren't
zombie processes, but real ssh processes sitting around doing nothing
long after the svn process has disappeared.
To reiterate: with svn 1.6 with or without your patch applied, with
ssh connection pooling disabled, this command:
svn log svn+ssh://hostname/repo|head
causes the ssh process on the client, as well as an ssh and svnserve
on the server to hang around forever, as "real" processes.
Did you try reproducing with svn connection pooling disabled?
> Admittedly both of these tests were run on Mac OS X 10.5.6 (which
> claims to be an Open Brand UNIX 03 Registered Product, conforming to
> the SUSv3 and POSIX 1003.1 specifications for the C API, Shell
> Utilities, and Threads) -- I don't know what system James was using
> and it may behave differently.
I originally tested with a Linux CentOS5 system, which has
OpenSSH_4.3p2, OpenSSL 0.9.8b 04 May 2006
I've also just run the same test on OSX Leopard 10.5.6, which has
OpenSSH_5.1p1, OpenSSL 0.9.7l 28 Sep 2006.
To attempt to make sure a server-side configuration was not the issue,
I also tried against a number of different servers, including CentOS
5, Debian Lenny, and the OSX Leopard 10.5.6 system (all talking to the
default subversion server coming with each system, not a newly built
1.6) . All exhibited the same behavior. This does not appear to be
system specific, and I would be astounded if this problem did not
occur for everyone.
More details of the test I just ran on OSX:
With a fresh compile of svn 1.6.1 + svn 1.6.1 deps tarballs,
unmodified. I have no ~/.ssh/config file, and /etc/ssh_config is empty
except for comments. Similarly, no ~/.subversion/config.
First I made a test repository:
svnadmin create /tmp/svnthing
for x in {0001..1000}; do svn mkdir file:///tmp/svnthing/$x -m "Making
new directory"; done
Then, in /tmp/subversion-1.6.1, where I just built svn, I run:
./subversion/svn/svn log svn+ssh://localhost/tmp/svnthing | head
It prints:
------------------------------------------------------------------------
r1000 | jknight | 2009-04-19 23:40:02 -0400 (Sun, 19 Apr 2009) | 1 line
Making new directory
------------------------------------------------------------------------
r999 | jknight | 2009-04-19 23:40:02 -0400 (Sun, 19 Apr 2009) | 1 line
Making new directory
------------------------------------------------------------------------
r998 | jknight | 2009-04-19 23:40:02 -0400 (Sun, 19 Apr 2009) | 1 line
svn: Write error: Broken pipe
I then end up with a bunch of processes hanging around. As far as I
know, they'll be there forever (1 for the client, 3 for the server).
jknight 75255 0.0 0.1 75736 1020 s003 S 11:43PM 0:00.02
ssh localhost svnserve -t
root 75257 0.0 0.2 79596 1624 ?? S 11:43PM
0:00.05 /usr/sbin/sshd -i
jknight 75260 0.0 0.1 79692 576 ?? S 11:43PM
0:00.00 /usr/sbin/sshd -i
jknight 75261 0.0 0.1 77392 1200 ?? Ss 11:43PM 0:00.02
svnserve -t
James
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1817626
Re: [PATCH] fix for ssh zombies introduced with r35533
Posted by James Y Knight <fo...@fuhm.net>.
On Apr 19, 2009, at 10:45 PM, Kyle McKay wrote:
> I cannot duplicate the problem described by James using the released
> version of Subversion 1.6.1 with the patch I attached to issue 2580
> applied:
>
> http://subversion.tigris.org/issues/show_bug.cgi?id=2580
I see the issue both with and without your patch attached to issue
2580 applied, which is to be expected, as your patch does not attempt
to fix anything but the issue of "zombie" processes. These aren't
zombie processes, but real ssh processes sitting around doing nothing
long after the svn process has disappeared.
To reiterate: with svn 1.6 with or without your patch applied, with
ssh connection pooling disabled, this command:
svn log svn+ssh://hostname/repo|head
causes the ssh process on the client, as well as an ssh and svnserve
on the server to hang around forever, as "real" processes.
Did you try reproducing with svn connection pooling disabled?
> Admittedly both of these tests were run on Mac OS X 10.5.6 (which
> claims to be an Open Brand UNIX 03 Registered Product, conforming to
> the SUSv3 and POSIX 1003.1 specifications for the C API, Shell
> Utilities, and Threads) -- I don't know what system James was using
> and it may behave differently.
I originally tested with a Linux CentOS5 system, which has
OpenSSH_4.3p2, OpenSSL 0.9.8b 04 May 2006
I've also just run the same test on OSX Leopard 10.5.6, which has
OpenSSH_5.1p1, OpenSSL 0.9.7l 28 Sep 2006.
To attempt to make sure a server-side configuration was not the issue,
I also tried against a number of different servers, including CentOS
5, Debian Lenny, and the OSX Leopard 10.5.6 system (all talking to the
default subversion server coming with each system, not a newly built
1.6) . All exhibited the same behavior. This does not appear to be
system specific, and I would be astounded if this problem did not
occur for everyone.
More details of the test I just ran on OSX:
With a fresh compile of svn 1.6.1 + svn 1.6.1 deps tarballs,
unmodified. I have no ~/.ssh/config file, and /etc/ssh_config is empty
except for comments. Similarly, no ~/.subversion/config.
First I made a test repository:
svnadmin create /tmp/svnthing
for x in {0001..1000}; do svn mkdir file:///tmp/svnthing/$x -m "Making
new directory"; done
Then, in /tmp/subversion-1.6.1, where I just built svn, I run:
./subversion/svn/svn log svn+ssh://localhost/tmp/svnthing | head
It prints:
------------------------------------------------------------------------
r1000 | jknight | 2009-04-19 23:40:02 -0400 (Sun, 19 Apr 2009) | 1 line
Making new directory
------------------------------------------------------------------------
r999 | jknight | 2009-04-19 23:40:02 -0400 (Sun, 19 Apr 2009) | 1 line
Making new directory
------------------------------------------------------------------------
r998 | jknight | 2009-04-19 23:40:02 -0400 (Sun, 19 Apr 2009) | 1 line
svn: Write error: Broken pipe
I then end up with a bunch of processes hanging around. As far as I
know, they'll be there forever (1 for the client, 3 for the server).
jknight 75255 0.0 0.1 75736 1020 s003 S 11:43PM 0:00.02
ssh localhost svnserve -t
root 75257 0.0 0.2 79596 1624 ?? S 11:43PM
0:00.05 /usr/sbin/sshd -i
jknight 75260 0.0 0.1 79692 576 ?? S 11:43PM
0:00.00 /usr/sbin/sshd -i
jknight 75261 0.0 0.1 77392 1200 ?? Ss 11:43PM 0:00.02
svnserve -t
James
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1817627
To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].
Re: [PATCH] fix for ssh zombies introduced with r35533
Posted by Kyle McKay <ma...@gmail.com>.
Gavin,
I cannot duplicate the problem described by James using the released
version of Subversion 1.6.1 with the patch I attached to issue 2580
applied:
http://subversion.tigris.org/issues/show_bug.cgi?id=2580
Using the technique in his thread you reference below (assuming that
ssh connection pooling has been enabled as previously described), I
tried two scenarios:
1. Run the test script and ssh into the machine hosting the repository
while the test script is running. It might take a few tries to get
the secondary ssh to pick up the shared connection (because it will be
torn down and re-setup in between each svn log connection), but it can
be done.
In this case you can see the ssh svnserve -t process hanging around to
support the ssh login even after the test script has stopped. When
you exit the ssh login you see a message "Shared connection to ...
closed." This is confirmation that the ssh login was using a shared
connection created by the svn log test.
At this point the ssh svnserve -t process that was hanging around to
support the ssh login has exited for me.
2. ssh into the machine hosting the repository BEFORE running the test
script. Run the test script (notice how much faster it runs) all the
svn log commands appear to be using the shared connection. Now exit
the ssh login.
The login exits cleanly for me and does not leave any ssh process
hanging around.
Admittedly both of these tests were run on Mac OS X 10.5.6 (which
claims to be an Open Brand UNIX 03 Registered Product, conforming to
the SUSv3 and POSIX 1003.1 specifications for the C API, Shell
Utilities, and Threads) -- I don't know what system James was using
and it may behave differently.
Probably the only way to make everyone happy is to do the following:
A. Add a configuration item to the [tunnels] section
tunnel-agent-handling = kill | term | detach
Where the default is "kill" if not specified. This would be the
behavior (and code path) identical to svn version 1.5.6 and earlier
(APR_KILL_ALWAYS). "term" would have the identical code path to
"kill" except use APR_KILL_AFTER_TIMEOUT instead. "detach" would
modify the behavior as follows: For WIN32 systems APR_KILL_NEVER is
used while on other systems APR_KILL_ALWAYS is used AND the additional
code added by the patch I posted to issue 2580 becomes active (in the
other cases, "kill" and "term", that code would be inactive).
B. Allow the ./configure script to change the default for tunnel-agent-
handling if not specified from "kill" to either of the other options.
This would achieve the following:
1. 1.5.6 and earlier behavior restored as the default
2. No zombies created, ever
3. Debian can stop patching the svn distribution and just use a ./
configure option instead
I might be willing to do this work provided one of the committers
stands up and publicly commits to seeing the patches actually
submitted to the repository (as patch manager perhaps you could do
this?). While the svn project is indeed open source, it is not open
to external commits and without a public commitment from one of the
internal folks to see the patches are actually applied to trunk, I
can't see spending any time on creating them if they're just going to
sit around unapplied.
Kyle
P.S. Connection pooling is still possible in svn 1.5.6 and earlier on
unix-like systems without premature connection termination using the
simple workaround described here:
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1452200
On Apr 19, 2009, at 16:51, Gavin 'Beau' Baumanis wrote:
> Hi Kyle,
>
> I have noticed that your email with patch, is dated March 30, 2009
>
> and the following thread by James Knight:
>
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1523947
>
> Is dated April 02, 2009.
>
> I can't quite follow if James' reply is in response to your patch
> submission in THIS thread or the workaround suggestions in the above
> hyperlink.
>
> So, Can you tell me if your patch is still valid, despite James'
> comments?
> Does it require more work / comments from the list?
> Is it no longer valid as a result of James' comments and, so at
> least for the time being, there is nothing more you would like /
> need me to do with your patch (as Patch Manager)?
>
>
> Gavin.
>
>
> On 29/03/2009, at 10:04 AM, Kyle McKay wrote:
>
>> [[[
>> Fix ssh zombie problem introduced with revision 35533
>>
>> * subversion/libsvn_ra_svn/client.c
>> (make_tunnel): Fully detach tunnel process to avoid having it
>> receive signals while restoring the original
>> apr_pool_note_subprocess to avoid creating zombies.
>> ]]]
>>
>> As detailed in this thread:
>>
>> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1445592
>>
>> regarding the change introduced into client.c by revision 35533:
>>
>> http://svn.collab.net/viewvc/svn?view=revision&revision=35533
>>
>> On Mar 27, 2009, at 09:51, Hyrum K. Wright wrote:
>>> The case which drove r35533 was a user who uses ssh connection
>>> pooling for svn connections.
>>
>> Change r35533 removed the call to to apr_pool_note_subprocess meaning
>> that Subversion never reaps any of its children. This may be okay
>> for
>> a short-lived process, but is not okay for a long-lived process such
>> as a GUI tool that has linked with the Subversion library.
>>
>> If the GUI tool runs long enough, it can create so many un-reaped
>> zombie processes that system resources are exhausted and it becomes
>> impossible to spawn any new processes.
>>
>> The attached patch restores the original call to
>> apr_pool_note_subprocess thus guaranteeing no zombies are ever
>> created. It also arranges for the tunnel process to become detached
>> so that it can "die in piece, in its own time, on its own terms" as
>> the log comment for revision 35533 mentions, but without causing
>> zombie processes to be created.
>>
>> This change retains full compatibility with ssh connection pooling
>> while eliminating the zombie problem. See issue #2580 for the
>> original impetus for the change made in revision 35533.
>>
>> http://subversion.tigris.org/issues/show_bug.cgi?id=2580
>>
>> Kyle
>>
>> ------------------------------------------------------
>> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1462473
>> <client.c-patch>
>
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1816865
Re: [PATCH] fix for ssh zombies introduced with r35533
Posted by Kyle McKay <ma...@gmail.com>.
Gavin,
I cannot duplicate the problem described by James using the released
version of Subversion 1.6.1 with the patch I attached to issue 2580
applied:
http://subversion.tigris.org/issues/show_bug.cgi?id=2580
Using the technique in his thread you reference below (assuming that
ssh connection pooling has been enabled as previously described), I
tried two scenarios:
1. Run the test script and ssh into the machine hosting the repository
while the test script is running. It might take a few tries to get
the secondary ssh to pick up the shared connection (because it will be
torn down and re-setup in between each svn log connection), but it can
be done.
In this case you can see the ssh svnserve -t process hanging around to
support the ssh login even after the test script has stopped. When
you exit the ssh login you see a message "Shared connection to ...
closed." This is confirmation that the ssh login was using a shared
connection created by the svn log test.
At this point the ssh svnserve -t process that was hanging around to
support the ssh login has exited for me.
2. ssh into the machine hosting the repository BEFORE running the test
script. Run the test script (notice how much faster it runs) all the
svn log commands appear to be using the shared connection. Now exit
the ssh login.
The login exits cleanly for me and does not leave any ssh process
hanging around.
Admittedly both of these tests were run on Mac OS X 10.5.6 (which
claims to be an Open Brand UNIX 03 Registered Product, conforming to
the SUSv3 and POSIX 1003.1 specifications for the C API, Shell
Utilities, and Threads) -- I don't know what system James was using
and it may behave differently.
Probably the only way to make everyone happy is to do the following:
A. Add a configuration item to the [tunnels] section
tunnel-agent-handling = kill | term | detach
Where the default is "kill" if not specified. This would be the
behavior (and code path) identical to svn version 1.5.6 and earlier
(APR_KILL_ALWAYS). "term" would have the identical code path to
"kill" except use APR_KILL_AFTER_TIMEOUT instead. "detach" would
modify the behavior as follows: For WIN32 systems APR_KILL_NEVER is
used while on other systems APR_KILL_ALWAYS is used AND the additional
code added by the patch I posted to issue 2580 becomes active (in the
other cases, "kill" and "term", that code would be inactive).
B. Allow the ./configure script to change the default for tunnel-agent-
handling if not specified from "kill" to either of the other options.
This would achieve the following:
1. 1.5.6 and earlier behavior restored as the default
2. No zombies created, ever
3. Debian can stop patching the svn distribution and just use a ./
configure option instead
I might be willing to do this work provided one of the committers
stands up and publicly commits to seeing the patches actually
submitted to the repository (as patch manager perhaps you could do
this?). While the svn project is indeed open source, it is not open
to external commits and without a public commitment from one of the
internal folks to see the patches are actually applied to trunk, I
can't see spending any time on creating them if they're just going to
sit around unapplied.
Kyle
P.S. Connection pooling is still possible in svn 1.5.6 and earlier on
unix-like systems without premature connection termination using the
simple workaround described here:
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1452200
On Apr 19, 2009, at 16:51, Gavin 'Beau' Baumanis wrote:
> Hi Kyle,
>
> I have noticed that your email with patch, is dated March 30, 2009
>
> and the following thread by James Knight:
>
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1523947
>
> Is dated April 02, 2009.
>
> I can't quite follow if James' reply is in response to your patch
> submission in THIS thread or the workaround suggestions in the above
> hyperlink.
>
> So, Can you tell me if your patch is still valid, despite James'
> comments?
> Does it require more work / comments from the list?
> Is it no longer valid as a result of James' comments and, so at
> least for the time being, there is nothing more you would like /
> need me to do with your patch (as Patch Manager)?
>
>
> Gavin.
>
>
> On 29/03/2009, at 10:04 AM, Kyle McKay wrote:
>
>> [[[
>> Fix ssh zombie problem introduced with revision 35533
>>
>> * subversion/libsvn_ra_svn/client.c
>> (make_tunnel): Fully detach tunnel process to avoid having it
>> receive signals while restoring the original
>> apr_pool_note_subprocess to avoid creating zombies.
>> ]]]
>>
>> As detailed in this thread:
>>
>> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1445592
>>
>> regarding the change introduced into client.c by revision 35533:
>>
>> http://svn.collab.net/viewvc/svn?view=revision&revision=35533
>>
>> On Mar 27, 2009, at 09:51, Hyrum K. Wright wrote:
>>> The case which drove r35533 was a user who uses ssh connection
>>> pooling for svn connections.
>>
>> Change r35533 removed the call to to apr_pool_note_subprocess meaning
>> that Subversion never reaps any of its children. This may be okay
>> for
>> a short-lived process, but is not okay for a long-lived process such
>> as a GUI tool that has linked with the Subversion library.
>>
>> If the GUI tool runs long enough, it can create so many un-reaped
>> zombie processes that system resources are exhausted and it becomes
>> impossible to spawn any new processes.
>>
>> The attached patch restores the original call to
>> apr_pool_note_subprocess thus guaranteeing no zombies are ever
>> created. It also arranges for the tunnel process to become detached
>> so that it can "die in piece, in its own time, on its own terms" as
>> the log comment for revision 35533 mentions, but without causing
>> zombie processes to be created.
>>
>> This change retains full compatibility with ssh connection pooling
>> while eliminating the zombie problem. See issue #2580 for the
>> original impetus for the change made in revision 35533.
>>
>> http://subversion.tigris.org/issues/show_bug.cgi?id=2580
>>
>> Kyle
>>
>> ------------------------------------------------------
>> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1462473
>> <client.c-patch>
>
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1816867
To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].
Re: [PATCH] fix for ssh zombies introduced with r35533
Posted by Gavin Baumanis <ga...@thespidernet.com>.
Hi Kyle,
I have noticed that your email with patch, is dated March 30, 2009
and the following thread by James Knight;
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1523947
Is dated April 02, 2009.
I can't quite follow if James' reply is in response to your patch
submission in THIS thread or the workaround suggestions in the above
hyperlink.
So, Can you tell me if your patch is still valid, despite James'
comments?
Does it require more work / comments from the list?
Is it no longer valid as a result of James' comments and, so at least
for the time being, there is nothing more you would like / need me to
do with your patch (as Patch Manager)?
Gavin.
On 29/03/2009, at 10:04 AM, Kyle McKay wrote:
> [[[
> Fix ssh zombie problem introduced with revision 35533
>
> * subversion/libsvn_ra_svn/client.c
> (make_tunnel): Fully detach tunnel process to avoid having it
> receive signals while restoring the original
> apr_pool_note_subprocess to avoid creating zombies.
> ]]]
>
> As detailed in this thread:
>
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1445592
>
> regarding the change introduced into client.c by revision 35533:
>
> http://svn.collab.net/viewvc/svn?view=revision&revision=35533
>
> On Mar 27, 2009, at 09:51, Hyrum K. Wright wrote:
>> The case which drove r35533 was a user who uses ssh connection
>> pooling for svn connections.
>
> Change r35533 removed the call to to apr_pool_note_subprocess meaning
> that Subversion never reaps any of its children. This may be okay for
> a short-lived process, but is not okay for a long-lived process such
> as a GUI tool that has linked with the Subversion library.
>
> If the GUI tool runs long enough, it can create so many un-reaped
> zombie processes that system resources are exhausted and it becomes
> impossible to spawn any new processes.
>
> The attached patch restores the original call to
> apr_pool_note_subprocess thus guaranteeing no zombies are ever
> created. It also arranges for the tunnel process to become detached
> so that it can "die in piece, in its own time, on its own terms" as
> the log comment for revision 35533 mentions, but without causing
> zombie processes to be created.
>
> This change retains full compatibility with ssh connection pooling
> while eliminating the zombie problem. See issue #2580 for the
> original impetus for the change made in revision 35533.
>
> http://subversion.tigris.org/issues/show_bug.cgi?id=2580
>
> Kyle
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1462473
> <client.c-patch>
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1814682
Re: [PATCH] fix for ssh zombies introduced with r35533
Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.
Kyle,
Just letting you know this hasn't fallen on deaf ears. It's on my
stack somewhere. :)
-Hyrum
On Mar 28, 2009, at 6:04 PM, Kyle McKay wrote:
> [[[
> Fix ssh zombie problem introduced with revision 35533
>
> * subversion/libsvn_ra_svn/client.c
> (make_tunnel): Fully detach tunnel process to avoid having it
> receive signals while restoring the original
> apr_pool_note_subprocess to avoid creating zombies.
> ]]]
>
> As detailed in this thread:
>
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1445592
>
> regarding the change introduced into client.c by revision 35533:
>
> http://svn.collab.net/viewvc/svn?view=revision&revision=35533
>
> On Mar 27, 2009, at 09:51, Hyrum K. Wright wrote:
>> The case which drove r35533 was a user who uses ssh connection
>> pooling for svn connections.
>
> Change r35533 removed the call to to apr_pool_note_subprocess
> meaning that Subversion never reaps any of its children. This may
> be okay for a short-lived process, but is not okay for a long-lived
> process such as a GUI tool that has linked with the Subversion
> library.
>
> If the GUI tool runs long enough, it can create so many un-reaped
> zombie processes that system resources are exhausted and it becomes
> impossible to spawn any new processes.
>
> The attached patch restores the original call to
> apr_pool_note_subprocess thus guaranteeing no zombies are ever
> created. It also arranges for the tunnel process to become detached
> so that it can "die in piece, in its own time, on its own terms" as
> the log comment for revision 35533 mentions, but without causing
> zombie processes to be created.
>
> This change retains full compatibility with ssh connection pooling
> while eliminating the zombie problem. See issue #2580 for the
> original impetus for the change made in revision 35533.
>
> http://subversion.tigris.org/issues/show_bug.cgi?id=2580
>
> Kyle
>
> <client.c-patch>
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1523143