You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Kyle McKay <ma...@gmail.com> on 2009/03/27 16:21:34 UTC

zombie ssh processes


Re: zombie ssh processes

Posted by James Y Knight <fo...@fuhm.net>.
On Apr 2, 2009, at 4:29 PM, James Y Knight wrote:
> In my testing of the svn 1.6 client, I found that I end up with a  
> *lot* of ssh connections to the server sitting around doing nothing.  
> They aren't zombies, they're fully-fledged ssh processes hanging  
> around doing nothing with no svn process left to talk to. This could  
> get very bad on the server if a lot of users end up with a lot of  
> open idle connections.

Ping?

It seems to me that it's an extremely serious bug that running "svn  
log $repos|head" via svn+ssh:// will keep an ssh connection open  
forever every time you run it. Am I alone in thinking it's a serious  
bug or does nobody else see it?

James

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1580265

Re: zombie ssh processes

Posted by James Y Knight <fo...@fuhm.net>.
Kyle McKay wrote:
> ---------------
> RECOMMENDATIONS
> ---------------
> 1. Short term. Revert change 35533 and then change the
> APR_KILL_ALWAYS to an APR_KILL_NEVER. This will likely eliminate most
> of the zombie problems for long-lived processes using the Subversion
> library while remaining compatible with ssh connection pooling
> (ControlMaster/ControlPath).
> 2. Longer term. Add an option to ~/.subversion/config file ([tunnels]
> section?) that lets you select the apr_kill_conditions_e value passed
> to apr_pool_note_subprocess with it defaulting to APR_KILL_NEVER if
> not given (apr-kill-condition = ... ?).
>
In my testing of the svn 1.6 client, I found that I end up with a  
*lot* of ssh connections to the server sitting around doing nothing.  
They aren't zombies, they're fully-fledged ssh processes hanging  
around doing nothing with no svn process left to talk to. This could  
get very bad on the server if a lot of users end up with a lot of open  
idle connections.

It's easily reproducible for with the following trivial command:
for x in `seq 1 10`; do svn log $repos|head; done

You end up with 10 ssh processes left on your machine in the  
background, and similarly on the server side. And they seem to stay  
that way forever. They don't die "in their own time", they stick  
around forever (at least 2 days, I don't know if they'd die at some  
point...).

This is not reasonable...and didn't happen on older versions of svn.  
I'll note that the comment which was removed in r35533 even seemed to  
*SAY* this was one of the reasons for the kill to exist!

I also tested with Kyle's patch from the end of bug 2580. As I'd  
expect, it does not help this problem one bit, since it only attempts  
to solve the zombie problem, not the ssh-hangs-around-forever problem.  
However, backing out 35533 *does* fix the issue.

I'd like to recommend that the right thing to do now is simply revert  
change 35533.

If people want to do ssh connection-pooling, they can simply open up a  
control connection first, and setup svn to never be a control-master.

James

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1523947

Re: zombie ssh processes

Posted by Peter Samuelson <pe...@p12n.org>.
[Clark S. Cox III]
> - APR_KILL_{ALWAYS,AFTER_TIMEOUT,ONLY_ONCE} are inappropriate in the  
> connection sharing case. When the svn co completed, the shared  
> connection was severed, and the second ssh session was disconnected,  
> however with those options, the ssh subprocesses of the library-using  
> clients (such as Xcode) were properly reaped.
> 
> - APR_KILL_NEVER is inappropriate in the GUI client case (ssh  
> subprocesses of Xcode were not reaped until Xcode itself was  
> terminated).
> 
> - APR_JUST_WAIT is inappropriate, as the process (whether it be svn or  
> Xcode) that started the master ssh connection blocks until any other  
> ssh's using the same shared connection are terminated.
> 
> It seems that OpenSSH's connection sharing and svn's use of ssh are  
> fundamentally incompatible. At the very least, it doesn't seem that  
> there is a simple fix for both issues.

Some years ago, in Debian, I've applied two patches related to ssh
connection sharing.  One uses APR_KILL_AFTER_TIMEOUT, because as we've
all noted, SIGKILL causes various problems with connection sharing and
other things.  The other is a simple patch to the default
~/.subversion/config file:

    [tunnels]
    ssh = $SVN_SSH ssh -o ControlMaster=no

so that Subversion's ssh process never controls the shared socket.
It will still _use_ connection sharing, but will not _initiate_ it.

I would have proposed both patches for integration a long time ago, but
the first one depends on a version of OpenSSH patched not to complain
when it is killed with a non-KILL signal; the second depends on a
version of OpenSSH new enough to recognise the 'ControlMaster' option.
I can rely on both things in Debian, but I don't know that all
Subversion installations could.
-- 
Peter Samuelson | org-tld!p12n!peter | http://p12n.org/

Re: zombie ssh processes WORKAROUND

Posted by Kyle McKay <ma...@gmail.com>.
As a workaround to the problem (see history below), you can do the  
following:

1. Revert change 35533 (or use a prior version such as 1.5.6 that  
doesn't have change 35533) so that apr_pool_note_subprocess is being  
called with APR_KILL_ALWAYS.  APR_KILL_ALWAYS is the fastest option  
(it never waits) and guaranteed not to leave any zombies behind or  
allow ugly SIGTERM-related error messages to be displayed.

2. Create a /usr/local/bin/svn_ssh file with these contents:

#!/bin/sh
exec 3<&0
/bin/sh -c 'exec 0<&3 3<&-; exec "$0" "$@"' ssh "$@" &
exit 0

# NOTE: This depends on the shell having this behavior:
# -c string If the -c option is present, then commands are read
#           from string.  If there are arguments after the string,
#           they are assigned to the positional parameters,
#           starting with $0.
# Standards compliant shells do this.  See
# http://www.opengroup.org/onlinepubs/009695399/utilities/sh.html

3. Make sure it's chmod a+rx (place it somewhere else if you like,  
just adjust the path in the next step).

4. Edit your ~/.subversion/config and in the [tunnels] section do this:

[tunnels]
ssh = $SVN_SSH /usr/local/bin/svn_ssh

5. Finally edit your ~/.ssh/config and add something like this:

ControlMaster auto
ControlPath /tmp/sshpool-%l-%r@%h:%p

That's it.  No zombies left behind and connection pooling works fine  
-- Subversion can even launch the ssh master connection without problem.

There is an apr_procattr_detach_set function that would cause the  
tunnel process to be detached much like the svn_ssh script above does,  
but unfortunately apr_proc_detach (which is what gets called in the  
child to do the detach) freopens stdin, stdout and stderr to /dev/null  
thereby stepping on the stdin/stdout pipes that Subversion actually  
requires to use the tunnel (apr_proc_detach is called after the pipes  
are dup2'd into 0, 1 and 2).

As a workaround to lack of a suitable apr option to detach but not  
redirect stdin/stdout/stderr to /dev/null, the find_tunnel_agent  
function in client.c could be enhanced to add the following three  
arguments to the FRONT of the argv array it generates (shown here as  
comma separated C strings):

"/bin/sh",
"-c",
"exec 3<&0; /bin/sh -c 'exec 0<&3 3<&-; " /* split line for email */
   "exec \"$0\" \"$@\"' \"$0\" \"$@\"& exit 0"

This will cause any tunnel program to be run detached as though set up  
with a script similar to svn_ssh above. (Only when run on a system  
with a standards compliant /bin/sh though.)  Steps #2, #3 and #4 above  
are no longer necessary if this change is made.  (Alternatively  
Subversion could start calling fork/exec directly or get an option  
added to apr_procattr_detach_set/apr_proc_detach to not redirect stdin/ 
stdout/stderr to /dev/null and start using that.)

Kyle

P.S. It ought to be possible to cram it all into a single "SVN_SSH"  
variable or "[tunnels] ssh =" setting but the proper combination of  
quoting to make apr_tokenize_to_argv AND the nested sh callouts happy  
is eluding me at the moment.

On Mar 27, 2009, at 11:30, Clark S. Cox III wrote:
> On Mar 27, 2009, at 9:51 AM, Hyrum K. Wright wrote:
>
>> On Mar 27, 2009, at 11:21 AM, Kyle McKay wrote:
>>
>>> From the xcode-users mailing list:
>>>
>>>> From: Chris Espinosa <cd...@apple.com>
>>>> Date: March 26, 2009 15:39:14 PDT
>>>> To: Xcode Users <xc...@lists.apple.com>
>>>> Subject: Re: Xcode 3.1.2 and Subversion 1.6
>>>>
>>>> On Mar 25, 2009, at 3:19 PM, Chris Espinosa wrote:
>>>>> On Mar 25, 2009, at 3:07 PM, Rob Lockstone wrote:
>>>>>
>>>>>> Has anyone tried using Xcode 3.1.2 with the subversion 1.6.0
>>>>>> client? I think I recall (but may be wrong) that newer versions  
>>>>>> of
>>>>>> Xcode don't make assumptions about the version of subversion
>>>>>> that's installed and simply use whatever version it finds.
>>>>>
>>>>> We have not yet qualified any version of Xcode with Subversion
>>>>> 1.6.0 and don't recommend replacing existing Subversion library or
>>>>> client code with 1.6 until we've given it the green light.
>>>>
>>>> We've discovered in internal testing that this patch in Subversion
>>>> 1.6:
>>>>
>>>> http://svn.collab.net/viewvc/svn?view=revision&revision=35533
>>>>
>>>> can cause Subversion 1.6 to leave behind zombie ssh processes every
>>>> time you save a file in Xcode, and eventually exhaust your ability
>>>> to spawn new processes.  We don't recommend using Subversion 1.6
>>>> with Xcode 3.1.x at this time.
>>>>
>>>> Chris
>>>
>>> From:
>>>
>>> http://svn.apache.org/repos/asf/apr/apr/tags/1.0.0/include/apr_thread_proc.h
>>>
>>> APR_KILL_NEVER         // process is never sent any signals
>>> APR_KILL_ALWAYS   // process is sent SIGKILL on apr_pool_t cleanup
>>> APR_KILL_AFTER_TIMEOUT // SIGTERM, wait 3 seconds, SIGKILL
>>> APR_JUST_WAIT          // wait forever for the process to complete
>>> APR_KILL_ONLY_ONCE     // send SIGTERM and then wait
>>>
>>> Restoring the apr_pool_note_subprocess and using APR_KILL_NEVER  
>>> would
>>> allow the children to be reaped provided they exit before pool
>>> cleanup.
>>>
>>> However, that would likely not eliminate the zombie problem in Xcode
>>> as pool cleanup probably happens faster than ssh cleanup and exit in
>>> some cases.  How about using APR_KILL_AFTER_TIMEOUT or
>>> APR_KILL_ONLY_ONCE (or even APR_JUST_WAIT) ?
>>
>> How would this interact with ssh connection pooling?  The case which
>> drove r35533 was a user who uses ssh connection pooling for svn
>> connections.  Having svn kill the ssh connection is obviously
>> hazardous to such a scheme, how would using the other APR_KILL_*
>> conditions behave there (and would they fix the problem with XCode)?
>
>
> I've built with each of the above options passed to  
> apr_pool_note_subprocess. For each case, I:
>
> 1) Started an 'svn co' over svn+ssh (which launched the ssh control  
> master).
> 2) ssh-ed to the same server over the shared connection
> 3) waited for the co to complete
>
> I also:
> Launched Xcode, and did the GUI equivalent to an svn co
>
>
> - APR_KILL_{ALWAYS,AFTER_TIMEOUT,ONLY_ONCE} are inappropriate in the  
> connection sharing case. When the svn co completed, the shared  
> connection was severed, and the second ssh session was disconnected,  
> however with those options, the ssh subprocesses of the library- 
> using clients (such as Xcode) were properly reaped.
>
> - APR_KILL_NEVER is inappropriate in the GUI client case (ssh  
> subprocesses of Xcode were not reaped until Xcode itself was  
> terminated).
>
> - APR_JUST_WAIT is inappropriate, as the process (whether it be svn  
> or Xcode) that started the master ssh connection blocks until any  
> other ssh's using the same shared connection are terminated.
>
> It seems that OpenSSH's connection sharing and svn's use of ssh are  
> fundamentally incompatible. At the very least, it doesn't seem that  
> there is a simple fix for both issues.
>
> -- 
> Clark S. Cox III
> clark.cox@apple.com

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1452200

Re: zombie ssh processes WORKAROUND

Posted by Kyle McKay <ma...@gmail.com>.
As a workaround to the problem (see history below), you can do the  
following:

1. Revert change 35533 (or use a prior version such as 1.5.6 that  
doesn't have change 35533) so that apr_pool_note_subprocess is being  
called with APR_KILL_ALWAYS.  APR_KILL_ALWAYS is the fastest option  
(it never waits) and guaranteed not to leave any zombies behind or  
allow ugly SIGTERM-related error messages to be displayed.

2. Create a /usr/local/bin/svn_ssh file with these contents:

#!/bin/sh
exec 3<&0
/bin/sh -c 'exec 0<&3 3<&-; exec "$0" "$@"' ssh "$@" &
exit 0

# NOTE: This depends on the shell having this behavior:
# -c string If the -c option is present, then commands are read
#           from string.  If there are arguments after the string,
#           they are assigned to the positional parameters,
#           starting with $0.
# Standards compliant shells do this.  See
# http://www.opengroup.org/onlinepubs/009695399/utilities/sh.html

3. Make sure it's chmod a+rx (place it somewhere else if you like,  
just adjust the path in the next step).

4. Edit your ~/.subversion/config and in the [tunnels] section do this:

[tunnels]
ssh = $SVN_SSH /usr/local/bin/svn_ssh

5. Finally edit your ~/.ssh/config and add something like this:

ControlMaster auto
ControlPath /tmp/sshpool-%l-%r@%h:%p

That's it.  No zombies left behind and connection pooling works fine  
-- Subversion can even launch the ssh master connection without problem.

There is an apr_procattr_detach_set function that would cause the  
tunnel process to be detached much like the svn_ssh script above does,  
but unfortunately apr_proc_detach (which is what gets called in the  
child to do the detach) freopens stdin, stdout and stderr to /dev/null  
thereby stepping on the stdin/stdout pipes that Subversion actually  
requires to use the tunnel (apr_proc_detach is called after the pipes  
are dup2'd into 0, 1 and 2).

As a workaround to lack of a suitable apr option to detach but not  
redirect stdin/stdout/stderr to /dev/null, the find_tunnel_agent  
function in client.c could be enhanced to add the following three  
arguments to the FRONT of the argv array it generates (shown here as  
comma separated C strings):

"/bin/sh",
"-c",
"exec 3<&0; /bin/sh -c 'exec 0<&3 3<&-; " /* split line for email */
   "exec \"$0\" \"$@\"' \"$0\" \"$@\"& exit 0"

This will cause any tunnel program to be run detached as though set up  
with a script similar to svn_ssh above. (Only when run on a system  
with a standards compliant /bin/sh though.)  Steps #2, #3 and #4 above  
are no longer necessary if this change is made.  (Alternatively  
Subversion could start calling fork/exec directly or get an option  
added to apr_procattr_detach_set/apr_proc_detach to not redirect stdin/ 
stdout/stderr to /dev/null and start using that.)

Kyle

P.S. It ought to be possible to cram it all into a single "SVN_SSH"  
variable or "[tunnels] ssh =" setting but the proper combination of  
quoting to make apr_tokenize_to_argv AND the nested sh callouts happy  
is eluding me at the moment.

On Mar 27, 2009, at 11:30, Clark S. Cox III wrote:
> On Mar 27, 2009, at 9:51 AM, Hyrum K. Wright wrote:
>
>> On Mar 27, 2009, at 11:21 AM, Kyle McKay wrote:
>>
>>> From the xcode-users mailing list:
>>>
>>>> From: Chris Espinosa <cd...@apple.com>
>>>> Date: March 26, 2009 15:39:14 PDT
>>>> To: Xcode Users <xc...@lists.apple.com>
>>>> Subject: Re: Xcode 3.1.2 and Subversion 1.6
>>>>
>>>> On Mar 25, 2009, at 3:19 PM, Chris Espinosa wrote:
>>>>> On Mar 25, 2009, at 3:07 PM, Rob Lockstone wrote:
>>>>>
>>>>>> Has anyone tried using Xcode 3.1.2 with the subversion 1.6.0
>>>>>> client? I think I recall (but may be wrong) that newer versions  
>>>>>> of
>>>>>> Xcode don't make assumptions about the version of subversion
>>>>>> that's installed and simply use whatever version it finds.
>>>>>
>>>>> We have not yet qualified any version of Xcode with Subversion
>>>>> 1.6.0 and don't recommend replacing existing Subversion library or
>>>>> client code with 1.6 until we've given it the green light.
>>>>
>>>> We've discovered in internal testing that this patch in Subversion
>>>> 1.6:
>>>>
>>>> http://svn.collab.net/viewvc/svn?view=revision&revision=35533
>>>>
>>>> can cause Subversion 1.6 to leave behind zombie ssh processes every
>>>> time you save a file in Xcode, and eventually exhaust your ability
>>>> to spawn new processes.  We don't recommend using Subversion 1.6
>>>> with Xcode 3.1.x at this time.
>>>>
>>>> Chris
>>>
>>> From:
>>>
>>> http://svn.apache.org/repos/asf/apr/apr/tags/1.0.0/include/apr_thread_proc.h
>>>
>>> APR_KILL_NEVER         // process is never sent any signals
>>> APR_KILL_ALWAYS   // process is sent SIGKILL on apr_pool_t cleanup
>>> APR_KILL_AFTER_TIMEOUT // SIGTERM, wait 3 seconds, SIGKILL
>>> APR_JUST_WAIT          // wait forever for the process to complete
>>> APR_KILL_ONLY_ONCE     // send SIGTERM and then wait
>>>
>>> Restoring the apr_pool_note_subprocess and using APR_KILL_NEVER  
>>> would
>>> allow the children to be reaped provided they exit before pool
>>> cleanup.
>>>
>>> However, that would likely not eliminate the zombie problem in Xcode
>>> as pool cleanup probably happens faster than ssh cleanup and exit in
>>> some cases.  How about using APR_KILL_AFTER_TIMEOUT or
>>> APR_KILL_ONLY_ONCE (or even APR_JUST_WAIT) ?
>>
>> How would this interact with ssh connection pooling?  The case which
>> drove r35533 was a user who uses ssh connection pooling for svn
>> connections.  Having svn kill the ssh connection is obviously
>> hazardous to such a scheme, how would using the other APR_KILL_*
>> conditions behave there (and would they fix the problem with XCode)?
>
>
> I've built with each of the above options passed to  
> apr_pool_note_subprocess. For each case, I:
>
> 1) Started an 'svn co' over svn+ssh (which launched the ssh control  
> master).
> 2) ssh-ed to the same server over the shared connection
> 3) waited for the co to complete
>
> I also:
> Launched Xcode, and did the GUI equivalent to an svn co
>
>
> - APR_KILL_{ALWAYS,AFTER_TIMEOUT,ONLY_ONCE} are inappropriate in the  
> connection sharing case. When the svn co completed, the shared  
> connection was severed, and the second ssh session was disconnected,  
> however with those options, the ssh subprocesses of the library- 
> using clients (such as Xcode) were properly reaped.
>
> - APR_KILL_NEVER is inappropriate in the GUI client case (ssh  
> subprocesses of Xcode were not reaped until Xcode itself was  
> terminated).
>
> - APR_JUST_WAIT is inappropriate, as the process (whether it be svn  
> or Xcode) that started the master ssh connection blocks until any  
> other ssh's using the same shared connection are terminated.
>
> It seems that OpenSSH's connection sharing and svn's use of ssh are  
> fundamentally incompatible. At the very least, it doesn't seem that  
> there is a simple fix for both issues.
>
> -- 
> Clark S. Cox III
> clark.cox@apple.com

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1452201

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: zombie ssh processes

Posted by "Clark S. Cox III" <cl...@apple.com>.
On Mar 27, 2009, at 9:51 AM, Hyrum K. Wright wrote:

> On Mar 27, 2009, at 11:21 AM, Kyle McKay wrote:
>
>> From the xcode-users mailing list:
>>
>>> From: Chris Espinosa <cd...@apple.com>
>>> Date: March 26, 2009 15:39:14 PDT
>>> To: Xcode Users <xc...@lists.apple.com>
>>> Subject: Re: Xcode 3.1.2 and Subversion 1.6
>>>
>>> On Mar 25, 2009, at 3:19 PM, Chris Espinosa wrote:
>>>> On Mar 25, 2009, at 3:07 PM, Rob Lockstone wrote:
>>>>
>>>>> Has anyone tried using Xcode 3.1.2 with the subversion 1.6.0
>>>>> client? I think I recall (but may be wrong) that newer versions of
>>>>> Xcode don't make assumptions about the version of subversion
>>>>> that's installed and simply use whatever version it finds.
>>>>
>>>> We have not yet qualified any version of Xcode with Subversion
>>>> 1.6.0 and don't recommend replacing existing Subversion library or
>>>> client code with 1.6 until we've given it the green light.
>>>
>>> We've discovered in internal testing that this patch in Subversion
>>> 1.6:
>>>
>>> http://svn.collab.net/viewvc/svn?view=revision&revision=35533
>>>
>>> can cause Subversion 1.6 to leave behind zombie ssh processes every
>>> time you save a file in Xcode, and eventually exhaust your ability
>>> to spawn new processes.  We don't recommend using Subversion 1.6
>>> with Xcode 3.1.x at this time.
>>>
>>> Chris
>>
>> From:
>>
>> http://svn.apache.org/repos/asf/apr/apr/tags/1.0.0/include/apr_thread_proc.h
>>
>> APR_KILL_NEVER         // process is never sent any signals
>> APR_KILL_ALWAYS   // process is sent SIGKILL on apr_pool_t cleanup
>> APR_KILL_AFTER_TIMEOUT // SIGTERM, wait 3 seconds, SIGKILL
>> APR_JUST_WAIT          // wait forever for the process to complete
>> APR_KILL_ONLY_ONCE     // send SIGTERM and then wait
>>
>> Restoring the apr_pool_note_subprocess and using APR_KILL_NEVER would
>> allow the children to be reaped provided they exit before pool
>> cleanup.
>>
>> However, that would likely not eliminate the zombie problem in Xcode
>> as pool cleanup probably happens faster than ssh cleanup and exit in
>> some cases.  How about using APR_KILL_AFTER_TIMEOUT or
>> APR_KILL_ONLY_ONCE (or even APR_JUST_WAIT) ?
>
> How would this interact with ssh connection pooling?  The case which
> drove r35533 was a user who uses ssh connection pooling for svn
> connections.  Having svn kill the ssh connection is obviously
> hazardous to such a scheme, how would using the other APR_KILL_*
> conditions behave there (and would they fix the problem with XCode)?


I've built with each of the above options passed to  
apr_pool_note_subprocess. For each case, I:

1) Started an 'svn co' over svn+ssh (which launched the ssh control  
master).
2) ssh-ed to the same server over the shared connection
3) waited for the co to complete

I also:
Launched Xcode, and did the GUI equivalent to an svn co


- APR_KILL_{ALWAYS,AFTER_TIMEOUT,ONLY_ONCE} are inappropriate in the  
connection sharing case. When the svn co completed, the shared  
connection was severed, and the second ssh session was disconnected,  
however with those options, the ssh subprocesses of the library-using  
clients (such as Xcode) were properly reaped.

- APR_KILL_NEVER is inappropriate in the GUI client case (ssh  
subprocesses of Xcode were not reaped until Xcode itself was  
terminated).

- APR_JUST_WAIT is inappropriate, as the process (whether it be svn or  
Xcode) that started the master ssh connection blocks until any other  
ssh's using the same shared connection are terminated.

It seems that OpenSSH's connection sharing and svn's use of ssh are  
fundamentally incompatible. At the very least, it doesn't seem that  
there is a simple fix for both issues.

-- 
Clark S. Cox III
clark.cox@apple.com

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1447105

Re: zombie ssh processes

Posted by Kyle McKay <ma...@gmail.com>.
On Mar 27, 2009, at 09:51, Hyrum K. Wright wrote:
> On Mar 27, 2009, at 11:21 AM, Kyle McKay wrote:
>
>> From the xcode-users mailing list:
>>
>>> From: Chris Espinosa <cd...@apple.com>
>>> Date: March 26, 2009 15:39:14 PDT
>>> To: Xcode Users <xc...@lists.apple.com>
>>> Subject: Re: Xcode 3.1.2 and Subversion 1.6
>>>
>>> On Mar 25, 2009, at 3:19 PM, Chris Espinosa wrote:
>>>> On Mar 25, 2009, at 3:07 PM, Rob Lockstone wrote:
>>>>
>>>>> Has anyone tried using Xcode 3.1.2 with the subversion 1.6.0
>>>>> client? I think I recall (but may be wrong) that newer versions of
>>>>> Xcode don't make assumptions about the version of subversion
>>>>> that's installed and simply use whatever version it finds.
>>>>
>>>> We have not yet qualified any version of Xcode with Subversion
>>>> 1.6.0 and don't recommend replacing existing Subversion library or
>>>> client code with 1.6 until we've given it the green light.
>>>
>>> We've discovered in internal testing that this patch in Subversion
>>> 1.6:
>>>
>>> http://svn.collab.net/viewvc/svn?view=revision&revision=35533
>>>
>>> can cause Subversion 1.6 to leave behind zombie ssh processes every
>>> time you save a file in Xcode, and eventually exhaust your ability
>>> to spawn new processes.  We don't recommend using Subversion 1.6
>>> with Xcode 3.1.x at this time.
>>>
>>> Chris
>>
>> From:
>>
>> http://svn.apache.org/repos/asf/apr/apr/tags/1.0.0/include/apr_thread_proc.h
>>
>> APR_KILL_NEVER         // process is never sent any signals
>> APR_KILL_ALWAYS   // process is sent SIGKILL on apr_pool_t cleanup
>> APR_KILL_AFTER_TIMEOUT // SIGTERM, wait 3 seconds, SIGKILL
>> APR_JUST_WAIT          // wait forever for the process to complete
>> APR_KILL_ONLY_ONCE     // send SIGTERM and then wait
>>
>> Restoring the apr_pool_note_subprocess and using APR_KILL_NEVER would
>> allow the children to be reaped provided they exit before pool  
>> cleanup.
>>
>> However, that would likely not eliminate the zombie problem in Xcode
>> as pool cleanup probably happens faster than ssh cleanup and exit in
>> some cases.  How about using APR_KILL_AFTER_TIMEOUT or
>> APR_KILL_ONLY_ONCE (or even APR_JUST_WAIT) ?
>
> How would this interact with ssh connection pooling?  The case which  
> drove r35533 was a user who uses ssh connection pooling for svn  
> connections.  Having svn kill the ssh connection is obviously  
> hazardous to such a scheme, how would using the other APR_KILL_*  
> conditions behave there (and would they fix the problem with XCode)?
>
> -Hyrum

I used the following configuration in ~/.ssh/config for the following  
TESTs:

ControlMaster auto
ControlPath /tmp/sshpool-%l-%r@%h:%p

Also ~/.ssh/id_rsa.pub key has been added to ~/.ssh/authorized_keys so  
that ssh localhost works without requiring any passwords.

It seems that the master and slave ssh connections are only related  
through the unix socket.  So when the master exits for whatever reason  
(SIGHUP, SIGINT, SIGTERM, SIGKILL etc.) the unix socket goes away and  
all the slave ssh connections die.

------
TEST 1
------
Suppose you do the following (with ssh configured as mentioned above):

ssh localhost sleep 15

And wait 15 seconds.  The ssh process exits normally.

------
TEST 2
------
Do this:

ssh localhost sleep 15

And within 15 seconds, go to another window/terminal/screen and do this:

ssh -t localhost top # The "-t" option is important here

If you go back to the first window, you'll notice that even after 15  
seconds the first ssh process doesn't exit.  Go back to the window  
running top and press "q".  You should see a message about "Shared  
connection to localhost closed." and if you now go back to the first  
ssh process, you'll see that it has exited normally.

------
TEST 3
------
Do this:

ssh localhost sleep 15

And within 15 seconds, go to another window/terminal/screen and do this:

ssh localhost top # DO NOT USE "-t" THIS TIME

If you go back to the first window, you'll notice that even after 15  
seconds the first ssh process doesn't exit.  Go back to the window  
running top and press Ctrl-C.  The second ssh process should exit, but  
you will not see the message about closing the shared connection.  If  
you go back to the first ssh process, you'll see that it's still  
waiting to exit -- the abnormal exit of the second ssh process  
prevents the first (master) ssh process from noticing and it will  
never close now unless you send it one of SIGHUP, SIGINT, SIGTERM,  
SIGKILL etc.

-----------
CONCLUSIONS
-----------
1. For ssh connection pooling with ControlMaster/ControlPath to work,  
the ssh master process must not receive any kind of hup/interrupt/quit/ 
terminate signal.
2. If any of the slave ssh connections fail to exit normally, the  
master ssh connection will never exit without some kind of signal sent  
to it.
3. ASSUMPTION: The probability of some slave ssh connection exiting  
abnormally is low, but > 0.
3. ASSUMPTION: The probability of Subversion's ssh connection being  
the master connection is > 0.
4. Given #1, #2, #3, and #4 Subversion will sooner or later leave  
behind a running ssh master connection.

5. Omitting the apr_pool_note_subprocess call prevents Subversion from  
reaping its children.  This will ALWAYS result in zombies unless there  
has been a signal(SIGCHLD, SIG_IGN) call previously (such a call would  
be highly unfriendly to any program linked with the Subversion library).
6. When Subversion is accessed directly via the API from the  
Subversion library, it may be part of a long-running process that  
persists across many Subversion operations.
7. Given #5 and #6 the number of zombies created may eventually  
overwhelm the system resources and exhaust your ability to spawn new  
processes (this is what's happening with Xcode, but could happen to  
any long-running program that links with the Subversion library).

8. The only apr_pool_note_subprocess option that does not send any  
signals is APR_KILL_NEVER.  However it will still reap zombie children  
provided they have exited by pool cleanup time.
9. ASSUMPTION: The probability of svn's ssh connection exiting after  
the pool cleanup is low, but > 0.
10. Given #8 and #9 svn will still probably create a few zombies even  
with APR_KILL_NEVER but likely this will be a far smaller number than  
without any apr_pool_note_subprocess call at all.

11. You can't have it both ways (never create zombies or stray ssh  
processes AND support ssh connection pooling) without some kind of  
configuration option as the required signaling behavior is mutually  
exclusive.

---------------
RECOMMENDATIONS
---------------
1. Short term.  Revert change 35533 and then change the  
APR_KILL_ALWAYS to an APR_KILL_NEVER.  This will likely eliminate most  
of the zombie problems for long-lived processes using the Subversion  
library while remaining compatible with ssh connection pooling  
(ControlMaster/ControlPath).
2. Longer term. Add an option to ~/.subversion/config file ([tunnels]  
section?) that lets you select the apr_kill_conditions_e value passed  
to apr_pool_note_subprocess with it defaulting to APR_KILL_NEVER if  
not given (apr-kill-condition = ... ?).

Kyle

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1447378

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: zombie ssh processes

Posted by Kyle McKay <ma...@gmail.com>.
On Mar 27, 2009, at 09:51, Hyrum K. Wright wrote:
> On Mar 27, 2009, at 11:21 AM, Kyle McKay wrote:
>
>> From the xcode-users mailing list:
>>
>>> From: Chris Espinosa <cd...@apple.com>
>>> Date: March 26, 2009 15:39:14 PDT
>>> To: Xcode Users <xc...@lists.apple.com>
>>> Subject: Re: Xcode 3.1.2 and Subversion 1.6
>>>
>>> On Mar 25, 2009, at 3:19 PM, Chris Espinosa wrote:
>>>> On Mar 25, 2009, at 3:07 PM, Rob Lockstone wrote:
>>>>
>>>>> Has anyone tried using Xcode 3.1.2 with the subversion 1.6.0
>>>>> client? I think I recall (but may be wrong) that newer versions of
>>>>> Xcode don't make assumptions about the version of subversion
>>>>> that's installed and simply use whatever version it finds.
>>>>
>>>> We have not yet qualified any version of Xcode with Subversion
>>>> 1.6.0 and don't recommend replacing existing Subversion library or
>>>> client code with 1.6 until we've given it the green light.
>>>
>>> We've discovered in internal testing that this patch in Subversion
>>> 1.6:
>>>
>>> http://svn.collab.net/viewvc/svn?view=revision&revision=35533
>>>
>>> can cause Subversion 1.6 to leave behind zombie ssh processes every
>>> time you save a file in Xcode, and eventually exhaust your ability
>>> to spawn new processes.  We don't recommend using Subversion 1.6
>>> with Xcode 3.1.x at this time.
>>>
>>> Chris
>>
>> From:
>>
>> http://svn.apache.org/repos/asf/apr/apr/tags/1.0.0/include/apr_thread_proc.h
>>
>> APR_KILL_NEVER         // process is never sent any signals
>> APR_KILL_ALWAYS   // process is sent SIGKILL on apr_pool_t cleanup
>> APR_KILL_AFTER_TIMEOUT // SIGTERM, wait 3 seconds, SIGKILL
>> APR_JUST_WAIT          // wait forever for the process to complete
>> APR_KILL_ONLY_ONCE     // send SIGTERM and then wait
>>
>> Restoring the apr_pool_note_subprocess and using APR_KILL_NEVER would
>> allow the children to be reaped provided they exit before pool  
>> cleanup.
>>
>> However, that would likely not eliminate the zombie problem in Xcode
>> as pool cleanup probably happens faster than ssh cleanup and exit in
>> some cases.  How about using APR_KILL_AFTER_TIMEOUT or
>> APR_KILL_ONLY_ONCE (or even APR_JUST_WAIT) ?
>
> How would this interact with ssh connection pooling?  The case which  
> drove r35533 was a user who uses ssh connection pooling for svn  
> connections.  Having svn kill the ssh connection is obviously  
> hazardous to such a scheme, how would using the other APR_KILL_*  
> conditions behave there (and would they fix the problem with XCode)?
>
> -Hyrum

I used the following configuration in ~/.ssh/config for the following  
TESTs:

ControlMaster auto
ControlPath /tmp/sshpool-%l-%r@%h:%p

Also ~/.ssh/id_rsa.pub key has been added to ~/.ssh/authorized_keys so  
that ssh localhost works without requiring any passwords.

It seems that the master and slave ssh connections are only related  
through the unix socket.  So when the master exits for whatever reason  
(SIGHUP, SIGINT, SIGTERM, SIGKILL etc.) the unix socket goes away and  
all the slave ssh connections die.

------
TEST 1
------
Suppose you do the following (with ssh configured as mentioned above):

ssh localhost sleep 15

And wait 15 seconds.  The ssh process exits normally.

------
TEST 2
------
Do this:

ssh localhost sleep 15

And within 15 seconds, go to another window/terminal/screen and do this:

ssh -t localhost top # The "-t" option is important here

If you go back to the first window, you'll notice that even after 15  
seconds the first ssh process doesn't exit.  Go back to the window  
running top and press "q".  You should see a message about "Shared  
connection to localhost closed." and if you now go back to the first  
ssh process, you'll see that it has exited normally.

------
TEST 3
------
Do this:

ssh localhost sleep 15

And within 15 seconds, go to another window/terminal/screen and do this:

ssh localhost top # DO NOT USE "-t" THIS TIME

If you go back to the first window, you'll notice that even after 15  
seconds the first ssh process doesn't exit.  Go back to the window  
running top and press Ctrl-C.  The second ssh process should exit, but  
you will not see the message about closing the shared connection.  If  
you go back to the first ssh process, you'll see that it's still  
waiting to exit -- the abnormal exit of the second ssh process  
prevents the first (master) ssh process from noticing and it will  
never close now unless you send it one of SIGHUP, SIGINT, SIGTERM,  
SIGKILL etc.

-----------
CONCLUSIONS
-----------
1. For ssh connection pooling with ControlMaster/ControlPath to work,  
the ssh master process must not receive any kind of hup/interrupt/quit/ 
terminate signal.
2. If any of the slave ssh connections fail to exit normally, the  
master ssh connection will never exit without some kind of signal sent  
to it.
3. ASSUMPTION: The probability of some slave ssh connection exiting  
abnormally is low, but > 0.
3. ASSUMPTION: The probability of Subversion's ssh connection being  
the master connection is > 0.
4. Given #1, #2, #3, and #4 Subversion will sooner or later leave  
behind a running ssh master connection.

5. Omitting the apr_pool_note_subprocess call prevents Subversion from  
reaping its children.  This will ALWAYS result in zombies unless there  
has been a signal(SIGCHLD, SIG_IGN) call previously (such a call would  
be highly unfriendly to any program linked with the Subversion library).
6. When Subversion is accessed directly via the API from the  
Subversion library, it may be part of a long-running process that  
persists across many Subversion operations.
7. Given #5 and #6 the number of zombies created may eventually  
overwhelm the system resources and exhaust your ability to spawn new  
processes (this is what's happening with Xcode, but could happen to  
any long-running program that links with the Subversion library).

8. The only apr_pool_note_subprocess option that does not send any  
signals is APR_KILL_NEVER.  However it will still reap zombie children  
provided they have exited by pool cleanup time.
9. ASSUMPTION: The probability of svn's ssh connection exiting after  
the pool cleanup is low, but > 0.
10. Given #8 and #9 svn will still probably create a few zombies even  
with APR_KILL_NEVER but likely this will be a far smaller number than  
without any apr_pool_note_subprocess call at all.

11. You can't have it both ways (never create zombies or stray ssh  
processes AND support ssh connection pooling) without some kind of  
configuration option as the required signaling behavior is mutually  
exclusive.

---------------
RECOMMENDATIONS
---------------
1. Short term.  Revert change 35533 and then change the  
APR_KILL_ALWAYS to an APR_KILL_NEVER.  This will likely eliminate most  
of the zombie problems for long-lived processes using the Subversion  
library while remaining compatible with ssh connection pooling  
(ControlMaster/ControlPath).
2. Longer term. Add an option to ~/.subversion/config file ([tunnels]  
section?) that lets you select the apr_kill_conditions_e value passed  
to apr_pool_note_subprocess with it defaulting to APR_KILL_NEVER if  
not given (apr-kill-condition = ... ?).

Kyle

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1447379

Re: zombie ssh processes

Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.
On Mar 27, 2009, at 11:21 AM, Kyle McKay wrote:

> From the xcode-users mailing list:
>
>> From: Chris Espinosa <cd...@apple.com>
>> Date: March 26, 2009 15:39:14 PDT
>> To: Xcode Users <xc...@lists.apple.com>
>> Subject: Re: Xcode 3.1.2 and Subversion 1.6
>>
>> On Mar 25, 2009, at 3:19 PM, Chris Espinosa wrote:
>>> On Mar 25, 2009, at 3:07 PM, Rob Lockstone wrote:
>>>
>>>> Has anyone tried using Xcode 3.1.2 with the subversion 1.6.0
>>>> client? I think I recall (but may be wrong) that newer versions of
>>>> Xcode don't make assumptions about the version of subversion
>>>> that's installed and simply use whatever version it finds.
>>>
>>> We have not yet qualified any version of Xcode with Subversion
>>> 1.6.0 and don't recommend replacing existing Subversion library or
>>> client code with 1.6 until we've given it the green light.
>>
>> We've discovered in internal testing that this patch in Subversion
>> 1.6:
>>
>> http://svn.collab.net/viewvc/svn?view=revision&revision=35533
>>
>> can cause Subversion 1.6 to leave behind zombie ssh processes every
>> time you save a file in Xcode, and eventually exhaust your ability
>> to spawn new processes.  We don't recommend using Subversion 1.6
>> with Xcode 3.1.x at this time.
>>
>> Chris
>
> From:
>
> http://svn.apache.org/repos/asf/apr/apr/tags/1.0.0/include/apr_thread_proc.h
>
> APR_KILL_NEVER         // process is never sent any signals
> APR_KILL_ALWAYS   // process is sent SIGKILL on apr_pool_t cleanup
> APR_KILL_AFTER_TIMEOUT // SIGTERM, wait 3 seconds, SIGKILL
> APR_JUST_WAIT          // wait forever for the process to complete
> APR_KILL_ONLY_ONCE     // send SIGTERM and then wait
>
> Restoring the apr_pool_note_subprocess and using APR_KILL_NEVER would
> allow the children to be reaped provided they exit before pool  
> cleanup.
>
> However, that would likely not eliminate the zombie problem in Xcode
> as pool cleanup probably happens faster than ssh cleanup and exit in
> some cases.  How about using APR_KILL_AFTER_TIMEOUT or
> APR_KILL_ONLY_ONCE (or even APR_JUST_WAIT) ?

How would this interact with ssh connection pooling?  The case which  
drove r35533 was a user who uses ssh connection pooling for svn  
connections.  Having svn kill the ssh connection is obviously  
hazardous to such a scheme, how would using the other APR_KILL_*  
conditions behave there (and would they fix the problem with XCode)?

-Hyrum

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1445856