You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Kyle McKay <ma...@gmail.com> on 2009/03/28 23:04:18 UTC

[PATCH] fix for ssh zombies introduced with r35533

[[[
Fix ssh zombie problem introduced with revision 35533

* subversion/libsvn_ra_svn/client.c
   (make_tunnel): Fully detach tunnel process to avoid having it
                  receive signals while restoring the original
                  apr_pool_note_subprocess to avoid creating zombies.
]]]

As detailed in this thread:

http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1445592

regarding the change introduced into client.c by revision 35533:

http://svn.collab.net/viewvc/svn?view=revision&revision=35533

On Mar 27, 2009, at 09:51, Hyrum K. Wright wrote:
> The case which drove r35533 was a user who uses ssh connection  
> pooling for svn connections.

Change r35533 removed the call to to apr_pool_note_subprocess meaning  
that Subversion never reaps any of its children.  This may be okay for  
a short-lived process, but is not okay for a long-lived process such  
as a GUI tool that has linked with the Subversion library.

If the GUI tool runs long enough, it can create so many un-reaped  
zombie processes that system resources are exhausted and it becomes  
impossible to spawn any new processes.

The attached patch restores the original call to  
apr_pool_note_subprocess thus guaranteeing no zombies are ever  
created.  It also arranges for the tunnel process to become detached  
so that it can "die in piece, in its own time, on its own terms" as  
the log comment for revision 35533 mentions, but without causing  
zombie processes to be created.

This change retains full compatibility with ssh connection pooling  
while eliminating the zombie problem.  See issue #2580 for the  
original impetus for the change made in revision 35533.

http://subversion.tigris.org/issues/show_bug.cgi?id=2580

Kyle

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1462475

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: [PATCH] fix for ssh zombies introduced with r35533

Posted by Gavin Baumanis <ga...@thespidernet.com>.
Ping. This patch has received no comments.
(Discounting Hyrum's - "Your patch has not fallen on deaf ears post").


Gavin.

On 29/03/2009, at 10:04 AM, Kyle McKay wrote:

> [[[
> Fix ssh zombie problem introduced with revision 35533
>
> * subversion/libsvn_ra_svn/client.c
>   (make_tunnel): Fully detach tunnel process to avoid having it
>                  receive signals while restoring the original
>                  apr_pool_note_subprocess to avoid creating zombies.
> ]]]
>
> As detailed in this thread:
>
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1445592
>
> regarding the change introduced into client.c by revision 35533:
>
> http://svn.collab.net/viewvc/svn?view=revision&revision=35533
>
> On Mar 27, 2009, at 09:51, Hyrum K. Wright wrote:
>> The case which drove r35533 was a user who uses ssh connection
>> pooling for svn connections.
>
> Change r35533 removed the call to to apr_pool_note_subprocess meaning
> that Subversion never reaps any of its children.  This may be okay for
> a short-lived process, but is not okay for a long-lived process such
> as a GUI tool that has linked with the Subversion library.
>
> If the GUI tool runs long enough, it can create so many un-reaped
> zombie processes that system resources are exhausted and it becomes
> impossible to spawn any new processes.
>
> The attached patch restores the original call to
> apr_pool_note_subprocess thus guaranteeing no zombies are ever
> created.  It also arranges for the tunnel process to become detached
> so that it can "die in piece, in its own time, on its own terms" as
> the log comment for revision 35533 mentions, but without causing
> zombie processes to be created.
>
> This change retains full compatibility with ssh connection pooling
> while eliminating the zombie problem.  See issue #2580 for the
> original impetus for the change made in revision 35533.
>
> http://subversion.tigris.org/issues/show_bug.cgi?id=2580
>
> Kyle
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1462473 
> <client.c-patch>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1688806

Re: [PATCH] fix for ssh zombies introduced with r35533

Posted by James Y Knight <fo...@fuhm.net>.
On Apr 21, 2009, at 7:15 AM, Greg Hudson wrote:
> It already does this; try running svnserve -t and hitting ctrl-D to
> terminate the input stream, and you'll see that it exits.
>
> (I don't know why that's not enough to solve the problem.)

I expect it's because svnserve doesn't monitor the input stream for  
EOF while it's blocking trying to write to the output stream.

James

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1848217

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: [PATCH] fix for ssh zombies introduced with r35533

Posted by Kyle McKay <ma...@gmail.com>.
On Apr 21, 2009, at 04:15, Greg Hudson wrote:
> On Tue, 2009-04-21 at 00:34 -0400, James Y Knight wrote:
>> Perhaps the svnserve
>> process to notice that its *input* pipe had been closed, and abort  
>> the
>> connection.
>
> It already does this; try running svnserve -t and hitting ctrl-D to
> terminate the input stream, and you'll see that it exits.
>
> (I don't know why that's not enough to solve the problem.)

It's not enough because Apple's sshd codebase is broken.  I have filed  
bug report rdar://6810722 with Apple about this.  Here are some  
details from an email exchange I've been having:

On Apr 20, 2009, at 23:53, Damien Miller wrote:
> On Mon, 20 Apr 2009, Kyle McKay wrote:
>
>> Thanks for your response.
>>
>> On Apr 20, 2009, at 23:10, Damien Miller wrote:
>>>
>>> This doesn't look like bug#85 based on what you sent below. It looks
>>> more like a whatever that you were running failing to notice its  
>>> stdout
>>> closing.
>>
>> Actually after looking at this some more, I think Apple's sshd is  
>> just
>> missing the Bug 85 patch and I think the debug messages back this up.
>
> No, Apple's sshd clearly does have the bug #85 patch:
>
>> debug2: channel 0: rcvd eow
>
> Would not be printed otherwise. Perhaps they have forcibly disabled
> USE_PIPES in session.c? It is required for the patch to correctly
> function.
>
>> And in this case sshd included the Bug 85 fix and so was using  
>> pipes to
>> communicate with the child and the child therefore got a SIGPIPE on  
>> the input
>> open -> closed transition and exited.
>
> No, in both cases the signalling between sshd and the child process is
> identical from the logs. There may be differences depending on whether
> Apple has modified their sshd to avoid using pipes (thwarting the
> half-close fix in the process), but you will need to post a full debug
> log from the server to tell.
>
> -d


Yup, you're right.  Apple has disabled the USE_PIPES define in their  
source code.

Here's the original file:

http://www.opensource.apple.com/darwinsource/10.5.6/OpenSSH-95.1.5/openssh/session.c.orig

And the file they're using:

http://www.opensource.apple.com/darwinsource/10.5.6/OpenSSH-95.1.5/openssh/session.c

Here's the diff:

--- session.c.orig
+++ session.c
@@ -424,7 +424,9 @@
	}
}

+#ifndef __APPLE__
#define USE_PIPES
+#endif
/*
  * This is called to fork and execute a command when we have no tty.   
This
  * will call do_child from the child, and server_loop from the parent  
after
@@ -2082,8 +2084,10 @@
		n_bytes = packet_remaining();
	tty_parse_modes(s->ttyfd, &n_bytes);

+#ifndef __APPLE_PRIVPTY__
	if (!use_privsep)
		pty_setowner(s->pw, s->tty);
+#endif

	/* Set window size from the packet. */
	pty_change_window_size(s->ptyfd, s->row, s->col, s->xpixel, s->ypixel);
@@ -2321,9 +2325,11 @@
	if (s->pid != 0)
		record_logout(s->pid, s->tty, s->pw->pw_name);

+#ifndef __APPLE_PRIVPTY__
	/* Release the pseudo-tty. */
	if (getuid() == 0)
		pty_release(s->tty);
+#endif

	/*
	 * Close the server side of the socket pairs.  We must do this after

Thanks for your help.

Kyle

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1842264

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: [PATCH] fix for ssh zombies introduced with r35533

Posted by Greg Hudson <gh...@mit.edu>.
On Tue, 2009-04-21 at 00:34 -0400, James Y Knight wrote:
> Perhaps the svnserve  
> process to notice that its *input* pipe had been closed, and abort the  
> connection.

It already does this; try running svnserve -t and hitting ctrl-D to
terminate the input stream, and you'll see that it exits.

(I don't know why that's not enough to solve the problem.)

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1841824

Re: [PATCH] fix for ssh zombies introduced with r35533

Posted by James Y Knight <fo...@fuhm.net>.
On Apr 20, 2009, at 10:46 PM, Kyle McKay wrote:

> Thanks for your details.  I can now reproduce the problem you describe
> provided I'm using a version of OpenSSH sshd that does not contain the
> fix for OpenSSH Bug 85 and the ssh client connects to it using
> Protocol 2:
>
>   https://bugzilla.mindrot.org/show_bug.cgi?id=85
>
> The bug is actually endemic to the sshd code, not the ssh code.
>
> To test whether or not any particular sshd suffers from the bug, do
> this:
>
> ssh -2 <ssh_host_to_test> yes | true

Wow, NICE job tracking that down!

This is rather insidious, because it's actually an SSHv2 protocol  
specification deficiency, not actually an implementation bug. Thus, it  
occurs if *either* the client *or* the server isn't patched, e.g., a  
OpenSSH_3.9p1 connecting to a OpenSSH_5.1p1 machine still sees this  
hang. Furthermore, the fix to the hang is an OpenSSH-specific protocol  
extension, which OpenSSH will only send to other OpenSSH  
implementations.

See "3. connection: Channel write close extension" in:
http://www.openbsd.org/cgi-bin/cvsweb/src/usr.bin/ssh/PROTOCOL?rev=1.12;content-type=text%2Fplain

> Whether or not to fix the zombie problem (it should definitely be
> fixed ASAP) is a different issue from whether or not to revert support
> for issue 2580.  Deciding to revert issue 2580 support also happens to
> fix the zombie problem.

Indeed.

Despite all the above talk about ssh, I think ultimately this truly is  
a subversion bug. SVN is responsible for both ends of the connection.  
When the subversion client gets a SIGPIPE on its output, it can do  
something to make sure it cleans up properly. One way to do this is to  
kill the ssh client process, like it did before.

There are other alternatives, although they're probably enough work  
that they're not doable in the 1.6 patch series. Perhaps the svnserve  
process to notice that its *input* pipe had been closed, and abort the  
connection. This, ssh *does* support: "ssh hostname cat < /dev/null"  
never hangs on any version of ssh I know of, demonstrating that if  
svnserve similarly noticed a closed input, it could properly shut  
down. Or perhaps the client could upon receipt of a signal, not exit  
immediately, but instead send a "cancel" command to the server,  
read&discard further data from ssh, and wait for the ssh process to  
exit first. Or perhaps there's yet another solution.

Of course, Subversion *could* simply declare that it requires that you  
use OpenSSH >= 5.1, but IMO that isn't very sensible. Not only is this  
version rather new and many people aren't using it yet, but it really  
is an *OpenSSH* dependency. Other clients or servers won't do. So, I  
do think the change made for issue 2580 should be reverted...at least  
for now.

At my workplace, nearly everything is using ssh versions < 5.1, and it  
would cause serious hurt for the svn server if people used unpatched  
subversion 1.6 clients which don't shutdown their ssh tunnels. I  
already reverted r35533 in the subversion build I created, and told  
everyone else to do the same if they make custom builds. But I imagine  
the same issue will occur for many other organizations.

James

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1836728

Re: [PATCH] fix for ssh zombies introduced with r35533

Posted by James Y Knight <fo...@fuhm.net>.
On Apr 20, 2009, at 10:46 PM, Kyle McKay wrote:

> Thanks for your details.  I can now reproduce the problem you describe
> provided I'm using a version of OpenSSH sshd that does not contain the
> fix for OpenSSH Bug 85 and the ssh client connects to it using
> Protocol 2:
>
>   https://bugzilla.mindrot.org/show_bug.cgi?id=85
>
> The bug is actually endemic to the sshd code, not the ssh code.
>
> To test whether or not any particular sshd suffers from the bug, do
> this:
>
> ssh -2 <ssh_host_to_test> yes | true

Wow, NICE job tracking that down!

This is rather insidious, because it's actually an SSHv2 protocol  
specification deficiency, not actually an implementation bug. Thus, it  
occurs if *either* the client *or* the server isn't patched, e.g., a  
OpenSSH_3.9p1 connecting to a OpenSSH_5.1p1 machine still sees this  
hang. Furthermore, the fix to the hang is an OpenSSH-specific protocol  
extension, which OpenSSH will only send to other OpenSSH  
implementations.

See "3. connection: Channel write close extension" in:
http://www.openbsd.org/cgi-bin/cvsweb/src/usr.bin/ssh/PROTOCOL?rev=1.12;content-type=text%2Fplain

> Whether or not to fix the zombie problem (it should definitely be
> fixed ASAP) is a different issue from whether or not to revert support
> for issue 2580.  Deciding to revert issue 2580 support also happens to
> fix the zombie problem.

Indeed.

Despite all the above talk about ssh, I think ultimately this truly is  
a subversion bug. SVN is responsible for both ends of the connection.  
When the subversion client gets a SIGPIPE on its output, it can do  
something to make sure it cleans up properly. One way to do this is to  
kill the ssh client process, like it did before.

There are other alternatives, although they're probably enough work  
that they're not doable in the 1.6 patch series. Perhaps the svnserve  
process to notice that its *input* pipe had been closed, and abort the  
connection. This, ssh *does* support: "ssh hostname cat < /dev/null"  
never hangs on any version of ssh I know of, demonstrating that if  
svnserve similarly noticed a closed input, it could properly shut  
down. Or perhaps the client could upon receipt of a signal, not exit  
immediately, but instead send a "cancel" command to the server,  
read&discard further data from ssh, and wait for the ssh process to  
exit first. Or perhaps there's yet another solution.

Of course, Subversion *could* simply declare that it requires that you  
use OpenSSH >= 5.1, but IMO that isn't very sensible. Not only is this  
version rather new and many people aren't using it yet, but it really  
is an *OpenSSH* dependency. Other clients or servers won't do. So, I  
do think the change made for issue 2580 should be reverted...at least  
for now.

At my workplace, nearly everything is using ssh versions < 5.1, and it  
would cause serious hurt for the svn server if people used unpatched  
subversion 1.6 clients which don't shutdown their ssh tunnels. I  
already reverted r35533 in the subversion build I created, and told  
everyone else to do the same if they make custom builds. But I imagine  
the same issue will occur for many other organizations.

James

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1836729

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: [PATCH] fix for ssh zombies introduced with r35533

Posted by Kyle McKay <ma...@gmail.com>.
James,

Thanks for your details.  I can now reproduce the problem you describe  
provided I'm using a version of OpenSSH sshd that does not contain the  
fix for OpenSSH Bug 85 and the ssh client connects to it using  
Protocol 2:

   https://bugzilla.mindrot.org/show_bug.cgi?id=85

The bug is actually endemic to the sshd code, not the ssh code.

To test whether or not any particular sshd suffers from the bug, do  
this:

ssh -2 <ssh_host_to_test> yes | true

If after responding to any prompts/entering any passwords, the  
connection hangs, the sshd server at <ssh_host_to_test> suffers from  
the bug.  Note that the client must connect with Protocol 2 (which is  
generally the default nowadays).  If the client connects with Protocol  
1 (which is often disabled nowadays), the bug does not happen.

Some results:

       Up-to-date Ubuntu 8.10: Bug 85 fixed
   Up-to-date Mac OS X 10.5.6: Bug 85 PRESENT
     Up-to-date MacPorts sshd: Bug 85 fixed

The other reason I wasn't able to duplicate the bug initially is that  
apparently the following:

   svn log svn+ssh://<some_host>/<some_repos> | head

has a race condition.  If the results coming back from the remote  
svnserve -t instance manage to be sent completely and buffered in  
memory before head closes the pipe, you don't see a problem.

For example this often demonstrates the problem:

   svn log svn+ssh://<some_host>/<some_repos> | true

whereas this often does not (provided you wait patiently for sleep to  
exit without interrupting it):

   svn log svn+ssh://<some_host>/<some_repos> | sleep 15

So to sum it up, the problem is an already-fixed OpenSSH bug distinct  
from Subversion issue 2580.  Fixing Subversion issue 2580 merely  
exposes the OpenSSH bug (provided you have an old sshd that doesn't  
contain the fix).

The patch I added to 2580 is not intended to fix the OpenSSH problem,  
it is only intended to eliminate the zombie problem introduced by  
change 35533.

Whether or not to fix the zombie problem (it should definitely be  
fixed ASAP) is a different issue from whether or not to revert support  
for issue 2580.  Deciding to revert issue 2580 support also happens to  
fix the zombie problem.

Kyle

On Apr 19, 2009, at 20:52, James Y Knight wrote:
> On Apr 19, 2009, at 10:45 PM, Kyle McKay wrote:
>
>> I cannot duplicate the problem described by James using the released
>> version of Subversion 1.6.1 with the patch I attached to issue 2580
>> applied:
>>
>> http://subversion.tigris.org/issues/show_bug.cgi?id=2580
>
> I see the issue both with and without your patch attached to issue  
> 2580 applied, which is to be expected, as your patch does not  
> attempt to fix anything but the issue of "zombie" processes. These  
> aren't zombie processes, but real ssh processes sitting around doing  
> nothing long after the svn process has disappeared.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1835189

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: [PATCH] fix for ssh zombies introduced with r35533

Posted by Kyle McKay <ma...@gmail.com>.
James,

Thanks for your details.  I can now reproduce the problem you describe  
provided I'm using a version of OpenSSH sshd that does not contain the  
fix for OpenSSH Bug 85 and the ssh client connects to it using  
Protocol 2:

   https://bugzilla.mindrot.org/show_bug.cgi?id=85

The bug is actually endemic to the sshd code, not the ssh code.

To test whether or not any particular sshd suffers from the bug, do  
this:

ssh -2 <ssh_host_to_test> yes | true

If after responding to any prompts/entering any passwords, the  
connection hangs, the sshd server at <ssh_host_to_test> suffers from  
the bug.  Note that the client must connect with Protocol 2 (which is  
generally the default nowadays).  If the client connects with Protocol  
1 (which is often disabled nowadays), the bug does not happen.

Some results:

       Up-to-date Ubuntu 8.10: Bug 85 fixed
   Up-to-date Mac OS X 10.5.6: Bug 85 PRESENT
     Up-to-date MacPorts sshd: Bug 85 fixed

The other reason I wasn't able to duplicate the bug initially is that  
apparently the following:

   svn log svn+ssh://<some_host>/<some_repos> | head

has a race condition.  If the results coming back from the remote  
svnserve -t instance manage to be sent completely and buffered in  
memory before head closes the pipe, you don't see a problem.

For example this often demonstrates the problem:

   svn log svn+ssh://<some_host>/<some_repos> | true

whereas this often does not (provided you wait patiently for sleep to  
exit without interrupting it):

   svn log svn+ssh://<some_host>/<some_repos> | sleep 15

So to sum it up, the problem is an already-fixed OpenSSH bug distinct  
from Subversion issue 2580.  Fixing Subversion issue 2580 merely  
exposes the OpenSSH bug (provided you have an old sshd that doesn't  
contain the fix).

The patch I added to 2580 is not intended to fix the OpenSSH problem,  
it is only intended to eliminate the zombie problem introduced by  
change 35533.

Whether or not to fix the zombie problem (it should definitely be  
fixed ASAP) is a different issue from whether or not to revert support  
for issue 2580.  Deciding to revert issue 2580 support also happens to  
fix the zombie problem.

Kyle

On Apr 19, 2009, at 20:52, James Y Knight wrote:
> On Apr 19, 2009, at 10:45 PM, Kyle McKay wrote:
>
>> I cannot duplicate the problem described by James using the released
>> version of Subversion 1.6.1 with the patch I attached to issue 2580
>> applied:
>>
>> http://subversion.tigris.org/issues/show_bug.cgi?id=2580
>
> I see the issue both with and without your patch attached to issue  
> 2580 applied, which is to be expected, as your patch does not  
> attempt to fix anything but the issue of "zombie" processes. These  
> aren't zombie processes, but real ssh processes sitting around doing  
> nothing long after the svn process has disappeared.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1835186

Re: [PATCH] fix for ssh zombies introduced with r35533

Posted by James Y Knight <fo...@fuhm.net>.
On Apr 19, 2009, at 10:45 PM, Kyle McKay wrote:

> I cannot duplicate the problem described by James using the released
> version of Subversion 1.6.1 with the patch I attached to issue 2580
> applied:
>
> http://subversion.tigris.org/issues/show_bug.cgi?id=2580


I see the issue both with and without your patch attached to issue  
2580 applied, which is to be expected, as your patch does not attempt  
to fix anything but the issue of "zombie" processes. These aren't  
zombie processes, but real ssh processes sitting around doing nothing  
long after the svn process has disappeared.

To reiterate: with svn 1.6 with or without your patch applied, with  
ssh connection pooling disabled, this command:
  svn log svn+ssh://hostname/repo|head
causes the ssh process on the client, as well as an ssh and svnserve  
on the server to hang around forever, as "real" processes.

Did you try reproducing with svn connection pooling disabled?

> Admittedly both of these tests were run on Mac OS X 10.5.6 (which  
> claims to be an Open Brand UNIX 03 Registered Product, conforming to  
> the SUSv3 and POSIX 1003.1 specifications for the C API, Shell  
> Utilities, and Threads) -- I don't know what system James was using  
> and it may behave differently.

I originally tested with a Linux CentOS5 system, which has
  OpenSSH_4.3p2, OpenSSL 0.9.8b 04 May 2006

I've also just run the same test on OSX Leopard 10.5.6, which has  
OpenSSH_5.1p1, OpenSSL 0.9.7l 28 Sep 2006.

To attempt to make sure a server-side configuration was not the issue,  
I also tried against a number of different servers, including CentOS  
5, Debian Lenny, and the OSX Leopard 10.5.6 system (all talking to the  
default subversion server coming with each system, not a newly built  
1.6) . All exhibited the same behavior. This does not appear to be  
system specific, and I would be astounded if this problem did not  
occur for everyone.

More details of the test I just ran on OSX:

With a fresh compile of svn 1.6.1 + svn 1.6.1 deps tarballs,  
unmodified. I have no ~/.ssh/config file, and /etc/ssh_config is empty  
except for comments. Similarly, no ~/.subversion/config.

First I made a test repository:
svnadmin create /tmp/svnthing
for x in {0001..1000}; do svn mkdir file:///tmp/svnthing/$x -m "Making  
new directory"; done

Then, in /tmp/subversion-1.6.1, where I just built svn, I run:
   ./subversion/svn/svn log svn+ssh://localhost/tmp/svnthing | head

It prints:
------------------------------------------------------------------------
r1000 | jknight | 2009-04-19 23:40:02 -0400 (Sun, 19 Apr 2009) | 1 line

Making new directory
------------------------------------------------------------------------
r999 | jknight | 2009-04-19 23:40:02 -0400 (Sun, 19 Apr 2009) | 1 line

Making new directory
------------------------------------------------------------------------
r998 | jknight | 2009-04-19 23:40:02 -0400 (Sun, 19 Apr 2009) | 1 line
svn: Write error: Broken pipe


I then end up with a bunch of processes hanging around. As far as I  
know, they'll be there forever (1 for the client, 3 for the server).
jknight  75255   0.0  0.1    75736   1020 s003  S    11:43PM   0:00.02  
ssh localhost svnserve -t
root     75257   0.0  0.2    79596   1624   ??  S    11:43PM    
0:00.05 /usr/sbin/sshd -i
jknight  75260   0.0  0.1    79692    576   ??  S    11:43PM    
0:00.00 /usr/sbin/sshd -i
jknight  75261   0.0  0.1    77392   1200   ??  Ss   11:43PM   0:00.02  
svnserve -t

James

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1817626

Re: [PATCH] fix for ssh zombies introduced with r35533

Posted by James Y Knight <fo...@fuhm.net>.
On Apr 19, 2009, at 10:45 PM, Kyle McKay wrote:

> I cannot duplicate the problem described by James using the released
> version of Subversion 1.6.1 with the patch I attached to issue 2580
> applied:
>
> http://subversion.tigris.org/issues/show_bug.cgi?id=2580


I see the issue both with and without your patch attached to issue  
2580 applied, which is to be expected, as your patch does not attempt  
to fix anything but the issue of "zombie" processes. These aren't  
zombie processes, but real ssh processes sitting around doing nothing  
long after the svn process has disappeared.

To reiterate: with svn 1.6 with or without your patch applied, with  
ssh connection pooling disabled, this command:
  svn log svn+ssh://hostname/repo|head
causes the ssh process on the client, as well as an ssh and svnserve  
on the server to hang around forever, as "real" processes.

Did you try reproducing with svn connection pooling disabled?

> Admittedly both of these tests were run on Mac OS X 10.5.6 (which  
> claims to be an Open Brand UNIX 03 Registered Product, conforming to  
> the SUSv3 and POSIX 1003.1 specifications for the C API, Shell  
> Utilities, and Threads) -- I don't know what system James was using  
> and it may behave differently.

I originally tested with a Linux CentOS5 system, which has
  OpenSSH_4.3p2, OpenSSL 0.9.8b 04 May 2006

I've also just run the same test on OSX Leopard 10.5.6, which has  
OpenSSH_5.1p1, OpenSSL 0.9.7l 28 Sep 2006.

To attempt to make sure a server-side configuration was not the issue,  
I also tried against a number of different servers, including CentOS  
5, Debian Lenny, and the OSX Leopard 10.5.6 system (all talking to the  
default subversion server coming with each system, not a newly built  
1.6) . All exhibited the same behavior. This does not appear to be  
system specific, and I would be astounded if this problem did not  
occur for everyone.

More details of the test I just ran on OSX:

With a fresh compile of svn 1.6.1 + svn 1.6.1 deps tarballs,  
unmodified. I have no ~/.ssh/config file, and /etc/ssh_config is empty  
except for comments. Similarly, no ~/.subversion/config.

First I made a test repository:
svnadmin create /tmp/svnthing
for x in {0001..1000}; do svn mkdir file:///tmp/svnthing/$x -m "Making  
new directory"; done

Then, in /tmp/subversion-1.6.1, where I just built svn, I run:
   ./subversion/svn/svn log svn+ssh://localhost/tmp/svnthing | head

It prints:
------------------------------------------------------------------------
r1000 | jknight | 2009-04-19 23:40:02 -0400 (Sun, 19 Apr 2009) | 1 line

Making new directory
------------------------------------------------------------------------
r999 | jknight | 2009-04-19 23:40:02 -0400 (Sun, 19 Apr 2009) | 1 line

Making new directory
------------------------------------------------------------------------
r998 | jknight | 2009-04-19 23:40:02 -0400 (Sun, 19 Apr 2009) | 1 line
svn: Write error: Broken pipe


I then end up with a bunch of processes hanging around. As far as I  
know, they'll be there forever (1 for the client, 3 for the server).
jknight  75255   0.0  0.1    75736   1020 s003  S    11:43PM   0:00.02  
ssh localhost svnserve -t
root     75257   0.0  0.2    79596   1624   ??  S    11:43PM    
0:00.05 /usr/sbin/sshd -i
jknight  75260   0.0  0.1    79692    576   ??  S    11:43PM    
0:00.00 /usr/sbin/sshd -i
jknight  75261   0.0  0.1    77392   1200   ??  Ss   11:43PM   0:00.02  
svnserve -t

James

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1817627

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: [PATCH] fix for ssh zombies introduced with r35533

Posted by Kyle McKay <ma...@gmail.com>.
Gavin,

I cannot duplicate the problem described by James using the released  
version of Subversion 1.6.1 with the patch I attached to issue 2580  
applied:

http://subversion.tigris.org/issues/show_bug.cgi?id=2580

Using the technique in his thread you reference below (assuming that  
ssh connection pooling has been enabled as previously described), I  
tried two scenarios:

1. Run the test script and ssh into the machine hosting the repository  
while the test script is running.  It might take a few tries to get  
the secondary ssh to pick up the shared connection (because it will be  
torn down and re-setup in between each svn log connection), but it can  
be done.

In this case you can see the ssh svnserve -t process hanging around to  
support the ssh login even after the test script has stopped.  When  
you exit the ssh login you see a message "Shared connection to ...  
closed."  This is confirmation that the ssh login was using a shared  
connection created by the svn log test.

At this point the ssh svnserve -t process that was hanging around to  
support the ssh login has exited for me.

2. ssh into the machine hosting the repository BEFORE running the test  
script.  Run the test script (notice how much faster it runs) all the  
svn log commands appear to be using the shared connection.  Now exit  
the ssh login.

The login exits cleanly for me and does not leave any ssh process  
hanging around.

Admittedly both of these tests were run on Mac OS X 10.5.6 (which  
claims to be an Open Brand UNIX 03 Registered Product, conforming to  
the SUSv3 and POSIX 1003.1 specifications for the C API, Shell  
Utilities, and Threads) -- I don't know what system James was using  
and it may behave differently.

Probably the only way to make everyone happy is to do the following:

A. Add a configuration item to the [tunnels] section

tunnel-agent-handling = kill | term | detach

Where the default is "kill" if not specified.  This would be the  
behavior (and code path) identical to svn version 1.5.6 and earlier  
(APR_KILL_ALWAYS).  "term" would have the identical code path to  
"kill" except use APR_KILL_AFTER_TIMEOUT instead. "detach" would  
modify the behavior as follows:  For WIN32 systems APR_KILL_NEVER is  
used while on other systems APR_KILL_ALWAYS is used AND the additional  
code added by the patch I posted to issue 2580 becomes active (in the  
other cases, "kill" and "term", that code would be inactive).

B. Allow the ./configure script to change the default for tunnel-agent- 
handling if not specified from "kill" to either of the other options.

This would achieve the following:

1. 1.5.6 and earlier behavior restored as the default
2. No zombies created, ever
3. Debian can stop patching the svn distribution and just use a ./ 
configure option instead

I might be willing to do this work provided one of the committers  
stands up and publicly commits to seeing the patches actually  
submitted to the repository (as patch manager perhaps you could do  
this?).  While the svn project is indeed open source, it is not open  
to external commits and without a public commitment from one of the  
internal folks to see the patches are actually applied to trunk, I  
can't see spending any time on creating them if they're just going to  
sit around unapplied.

Kyle

P.S. Connection pooling is still possible in svn 1.5.6 and earlier on  
unix-like systems without premature connection termination using the  
simple workaround described here:

http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1452200

On Apr 19, 2009, at 16:51, Gavin 'Beau' Baumanis wrote:
> Hi Kyle,
>
> I have noticed that your email with patch, is dated March 30, 2009
>
> and the following thread by James Knight:
>
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1523947
>
> Is dated April 02, 2009.
>
> I can't quite follow if James' reply is in response to your patch  
> submission in THIS thread or the workaround suggestions in the above  
> hyperlink.
>
> So, Can you tell me if your patch is still valid, despite James'  
> comments?
> Does it require more work / comments from the list?
> Is it no longer valid as a result of James' comments and, so at  
> least for the time being, there is nothing more you would like /  
> need me to do with your patch (as Patch Manager)?
>
>
> Gavin.
>
>
> On 29/03/2009, at 10:04 AM, Kyle McKay wrote:
>
>> [[[
>> Fix ssh zombie problem introduced with revision 35533
>>
>> * subversion/libsvn_ra_svn/client.c
>>  (make_tunnel): Fully detach tunnel process to avoid having it
>>                 receive signals while restoring the original
>>                 apr_pool_note_subprocess to avoid creating zombies.
>> ]]]
>>
>> As detailed in this thread:
>>
>> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1445592
>>
>> regarding the change introduced into client.c by revision 35533:
>>
>> http://svn.collab.net/viewvc/svn?view=revision&revision=35533
>>
>> On Mar 27, 2009, at 09:51, Hyrum K. Wright wrote:
>>> The case which drove r35533 was a user who uses ssh connection
>>> pooling for svn connections.
>>
>> Change r35533 removed the call to to apr_pool_note_subprocess meaning
>> that Subversion never reaps any of its children.  This may be okay  
>> for
>> a short-lived process, but is not okay for a long-lived process such
>> as a GUI tool that has linked with the Subversion library.
>>
>> If the GUI tool runs long enough, it can create so many un-reaped
>> zombie processes that system resources are exhausted and it becomes
>> impossible to spawn any new processes.
>>
>> The attached patch restores the original call to
>> apr_pool_note_subprocess thus guaranteeing no zombies are ever
>> created.  It also arranges for the tunnel process to become detached
>> so that it can "die in piece, in its own time, on its own terms" as
>> the log comment for revision 35533 mentions, but without causing
>> zombie processes to be created.
>>
>> This change retains full compatibility with ssh connection pooling
>> while eliminating the zombie problem.  See issue #2580 for the
>> original impetus for the change made in revision 35533.
>>
>> http://subversion.tigris.org/issues/show_bug.cgi?id=2580
>>
>> Kyle
>>
>> ------------------------------------------------------
>> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1462473 
>> <client.c-patch>
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1816865

Re: [PATCH] fix for ssh zombies introduced with r35533

Posted by Kyle McKay <ma...@gmail.com>.
Gavin,

I cannot duplicate the problem described by James using the released  
version of Subversion 1.6.1 with the patch I attached to issue 2580  
applied:

http://subversion.tigris.org/issues/show_bug.cgi?id=2580

Using the technique in his thread you reference below (assuming that  
ssh connection pooling has been enabled as previously described), I  
tried two scenarios:

1. Run the test script and ssh into the machine hosting the repository  
while the test script is running.  It might take a few tries to get  
the secondary ssh to pick up the shared connection (because it will be  
torn down and re-setup in between each svn log connection), but it can  
be done.

In this case you can see the ssh svnserve -t process hanging around to  
support the ssh login even after the test script has stopped.  When  
you exit the ssh login you see a message "Shared connection to ...  
closed."  This is confirmation that the ssh login was using a shared  
connection created by the svn log test.

At this point the ssh svnserve -t process that was hanging around to  
support the ssh login has exited for me.

2. ssh into the machine hosting the repository BEFORE running the test  
script.  Run the test script (notice how much faster it runs) all the  
svn log commands appear to be using the shared connection.  Now exit  
the ssh login.

The login exits cleanly for me and does not leave any ssh process  
hanging around.

Admittedly both of these tests were run on Mac OS X 10.5.6 (which  
claims to be an Open Brand UNIX 03 Registered Product, conforming to  
the SUSv3 and POSIX 1003.1 specifications for the C API, Shell  
Utilities, and Threads) -- I don't know what system James was using  
and it may behave differently.

Probably the only way to make everyone happy is to do the following:

A. Add a configuration item to the [tunnels] section

tunnel-agent-handling = kill | term | detach

Where the default is "kill" if not specified.  This would be the  
behavior (and code path) identical to svn version 1.5.6 and earlier  
(APR_KILL_ALWAYS).  "term" would have the identical code path to  
"kill" except use APR_KILL_AFTER_TIMEOUT instead. "detach" would  
modify the behavior as follows:  For WIN32 systems APR_KILL_NEVER is  
used while on other systems APR_KILL_ALWAYS is used AND the additional  
code added by the patch I posted to issue 2580 becomes active (in the  
other cases, "kill" and "term", that code would be inactive).

B. Allow the ./configure script to change the default for tunnel-agent- 
handling if not specified from "kill" to either of the other options.

This would achieve the following:

1. 1.5.6 and earlier behavior restored as the default
2. No zombies created, ever
3. Debian can stop patching the svn distribution and just use a ./ 
configure option instead

I might be willing to do this work provided one of the committers  
stands up and publicly commits to seeing the patches actually  
submitted to the repository (as patch manager perhaps you could do  
this?).  While the svn project is indeed open source, it is not open  
to external commits and without a public commitment from one of the  
internal folks to see the patches are actually applied to trunk, I  
can't see spending any time on creating them if they're just going to  
sit around unapplied.

Kyle

P.S. Connection pooling is still possible in svn 1.5.6 and earlier on  
unix-like systems without premature connection termination using the  
simple workaround described here:

http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1452200

On Apr 19, 2009, at 16:51, Gavin 'Beau' Baumanis wrote:
> Hi Kyle,
>
> I have noticed that your email with patch, is dated March 30, 2009
>
> and the following thread by James Knight:
>
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1523947
>
> Is dated April 02, 2009.
>
> I can't quite follow if James' reply is in response to your patch  
> submission in THIS thread or the workaround suggestions in the above  
> hyperlink.
>
> So, Can you tell me if your patch is still valid, despite James'  
> comments?
> Does it require more work / comments from the list?
> Is it no longer valid as a result of James' comments and, so at  
> least for the time being, there is nothing more you would like /  
> need me to do with your patch (as Patch Manager)?
>
>
> Gavin.
>
>
> On 29/03/2009, at 10:04 AM, Kyle McKay wrote:
>
>> [[[
>> Fix ssh zombie problem introduced with revision 35533
>>
>> * subversion/libsvn_ra_svn/client.c
>>  (make_tunnel): Fully detach tunnel process to avoid having it
>>                 receive signals while restoring the original
>>                 apr_pool_note_subprocess to avoid creating zombies.
>> ]]]
>>
>> As detailed in this thread:
>>
>> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1445592
>>
>> regarding the change introduced into client.c by revision 35533:
>>
>> http://svn.collab.net/viewvc/svn?view=revision&revision=35533
>>
>> On Mar 27, 2009, at 09:51, Hyrum K. Wright wrote:
>>> The case which drove r35533 was a user who uses ssh connection
>>> pooling for svn connections.
>>
>> Change r35533 removed the call to to apr_pool_note_subprocess meaning
>> that Subversion never reaps any of its children.  This may be okay  
>> for
>> a short-lived process, but is not okay for a long-lived process such
>> as a GUI tool that has linked with the Subversion library.
>>
>> If the GUI tool runs long enough, it can create so many un-reaped
>> zombie processes that system resources are exhausted and it becomes
>> impossible to spawn any new processes.
>>
>> The attached patch restores the original call to
>> apr_pool_note_subprocess thus guaranteeing no zombies are ever
>> created.  It also arranges for the tunnel process to become detached
>> so that it can "die in piece, in its own time, on its own terms" as
>> the log comment for revision 35533 mentions, but without causing
>> zombie processes to be created.
>>
>> This change retains full compatibility with ssh connection pooling
>> while eliminating the zombie problem.  See issue #2580 for the
>> original impetus for the change made in revision 35533.
>>
>> http://subversion.tigris.org/issues/show_bug.cgi?id=2580
>>
>> Kyle
>>
>> ------------------------------------------------------
>> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1462473 
>> <client.c-patch>
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1816867

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: [PATCH] fix for ssh zombies introduced with r35533

Posted by Gavin Baumanis <ga...@thespidernet.com>.
Hi Kyle,

I have noticed that your email with patch, is dated March 30, 2009

and the following thread by James Knight;
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1523947
Is dated April 02, 2009.

I can't quite follow if James' reply is in response to your patch  
submission in THIS thread or the workaround suggestions in the above  
hyperlink.

So, Can you tell me if your patch is still valid, despite James'  
comments?
Does it require more work / comments from the list?
Is it no longer valid as a result of James' comments and, so at least  
for the time being, there is nothing more you would like / need me to  
do with your patch (as Patch Manager)?


Gavin.


On 29/03/2009, at 10:04 AM, Kyle McKay wrote:

> [[[
> Fix ssh zombie problem introduced with revision 35533
>
> * subversion/libsvn_ra_svn/client.c
>   (make_tunnel): Fully detach tunnel process to avoid having it
>                  receive signals while restoring the original
>                  apr_pool_note_subprocess to avoid creating zombies.
> ]]]
>
> As detailed in this thread:
>
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1445592
>
> regarding the change introduced into client.c by revision 35533:
>
> http://svn.collab.net/viewvc/svn?view=revision&revision=35533
>
> On Mar 27, 2009, at 09:51, Hyrum K. Wright wrote:
>> The case which drove r35533 was a user who uses ssh connection
>> pooling for svn connections.
>
> Change r35533 removed the call to to apr_pool_note_subprocess meaning
> that Subversion never reaps any of its children.  This may be okay for
> a short-lived process, but is not okay for a long-lived process such
> as a GUI tool that has linked with the Subversion library.
>
> If the GUI tool runs long enough, it can create so many un-reaped
> zombie processes that system resources are exhausted and it becomes
> impossible to spawn any new processes.
>
> The attached patch restores the original call to
> apr_pool_note_subprocess thus guaranteeing no zombies are ever
> created.  It also arranges for the tunnel process to become detached
> so that it can "die in piece, in its own time, on its own terms" as
> the log comment for revision 35533 mentions, but without causing
> zombie processes to be created.
>
> This change retains full compatibility with ssh connection pooling
> while eliminating the zombie problem.  See issue #2580 for the
> original impetus for the change made in revision 35533.
>
> http://subversion.tigris.org/issues/show_bug.cgi?id=2580
>
> Kyle
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1462473 
> <client.c-patch>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1814682

Re: [PATCH] fix for ssh zombies introduced with r35533

Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.
Kyle,
Just letting you know this hasn't fallen on deaf ears.  It's on my  
stack somewhere. :)

-Hyrum

On Mar 28, 2009, at 6:04 PM, Kyle McKay wrote:

> [[[
> Fix ssh zombie problem introduced with revision 35533
>
> * subversion/libsvn_ra_svn/client.c
>  (make_tunnel): Fully detach tunnel process to avoid having it
>                 receive signals while restoring the original
>                 apr_pool_note_subprocess to avoid creating zombies.
> ]]]
>
> As detailed in this thread:
>
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1445592
>
> regarding the change introduced into client.c by revision 35533:
>
> http://svn.collab.net/viewvc/svn?view=revision&revision=35533
>
> On Mar 27, 2009, at 09:51, Hyrum K. Wright wrote:
>> The case which drove r35533 was a user who uses ssh connection  
>> pooling for svn connections.
>
> Change r35533 removed the call to to apr_pool_note_subprocess  
> meaning that Subversion never reaps any of its children.  This may  
> be okay for a short-lived process, but is not okay for a long-lived  
> process such as a GUI tool that has linked with the Subversion  
> library.
>
> If the GUI tool runs long enough, it can create so many un-reaped  
> zombie processes that system resources are exhausted and it becomes  
> impossible to spawn any new processes.
>
> The attached patch restores the original call to  
> apr_pool_note_subprocess thus guaranteeing no zombies are ever  
> created.  It also arranges for the tunnel process to become detached  
> so that it can "die in piece, in its own time, on its own terms" as  
> the log comment for revision 35533 mentions, but without causing  
> zombie processes to be created.
>
> This change retains full compatibility with ssh connection pooling  
> while eliminating the zombie problem.  See issue #2580 for the  
> original impetus for the change made in revision 35533.
>
> http://subversion.tigris.org/issues/show_bug.cgi?id=2580
>
> Kyle
>
> <client.c-patch>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1523143