You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Yun Zheng Hu <yu...@gmail.com> on 2005/08/24 22:40:07 UTC

[PATCH] fix non interruptable hang in svn client when connecting

[[[
When you connect to a non existing repository, but the host exists,
the apr_socket_connect will loop forever somehow, and is NOT
interruptable.
On OS X this will result in a inresponsive system, which is very bad. 

How to reproduce:
 svn co svn://svn.edgewall.com

There is no svn server running on that port, so try a CTRL-C to
cancel, but its not possible.. it will loop forever. process is not
killable on OS X either.

I found out that somebody else was having this problem too, but could
not find a bug report and neither the fix. The user that was having
this same bug can be read on:

http://svn.haxx.se/users/archive-2005-07/1027.shtml

The following patch will set a timeout of 10 seconds on apr_socket_connect.
So if it cant connect it will timeout safely, 10 seconds should be
more than enough I think.
The patch will also include the port number in the error message, it
might be useful or not..
]]]

The patch is very simple, open up:
subversion/libsvn_ra_svn/client.c

search for: "apr_socket_connect", and change the code to:

  status = apr_setsocketopt(*sock, APR_SO_TIMEOUT, 10 * APR_USEC_PER_SEC);
  if (status)
    return svn_error_wrap_apr(status, _("Can't set timeout"));

  status = apr_socket_connect(*sock, sa);
  if (status)
    return svn_error_wrap_apr(status, _("Can't connect to host '%s' on
port %u"),
                              hostname, port);

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by kf...@collab.net.
Malcolm Rowe <ma...@farside.org.uk> writes:
> Yes, and it seems like that's exactly what happening.
> 
> Karl: it's far _far_ more serious than it first appears. Sending the svn
> process a SIGINT actually hangs the _entire_ network stack, at least for
> new connections. _Any_ process that subsequently attempts to connect()
> hangs and enters an unkillable state.

Wow.  That's... bad.  I think we can be pretty sure the Mac OS X
kernel team is working on this one, then :-).

> I think the most sensible thing would be to apply Yun's patch, though we
> should note that we haven't quite worked this one out yet.

Done, and the comment there points to this thread.

-Karl

-- 
www.collab.net  <>  CollabNet  |  Distributed Development On Demand

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Thu, Aug 25, 2005 at 01:36:37PM -0400, Michael Sweet wrote:
> >It sounds as though you're saying that the Mac OS X (blocking) version of
> >connect() will hang the process and make it unkillable if you send it a
> >SIGINT while it's waiting for a connection to complete.
> >
> >That seems pretty weird to me - is that really what's happening? (although
> >I must admit that I don't understand how the process could block SIGKILL
> >anyway).
> A process can't block SIGKILL, however if the process is waiting on
> a system call, the SIGKILL will not be delivered until the process
> goes back to user space...

Yes, and it seems like that's exactly what happening.

Karl: it's far _far_ more serious than it first appears. Sending the svn
process a SIGINT actually hangs the _entire_ network stack, at least for
new connections. _Any_ process that subsequently attempts to connect()
hangs and enters an unkillable state.

The same behaviour is also exhibited with Yun's original testcase, though
I'm not convinced that that is necessarily representative, given that it's
connecting to the broadcast address.

I think the most sensible thing would be to apply Yun's patch, though we
should note that we haven't quite worked this one out yet.

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by Michael Sweet <mi...@easysw.com>.
Malcolm Rowe wrote:
> On Thu, Aug 25, 2005 at 05:39:01PM +0200, Yun Zheng Hu wrote:
> 
>>If we dont set a timeout, the apr_connect function will use a blocking
>>socket connect but does not timeout if the user sent an interrupt (the
>>bug).
>>If we do set a timeout it will be a nonblocking connect, and resort to
>>apr_poll for proper timeout (see apr/network_io/unix/sockets.c).
> 
> 
> It sounds as though you're saying that the Mac OS X (blocking) version of
> connect() will hang the process and make it unkillable if you send it a
> SIGINT while it's waiting for a connection to complete.
> 
> That seems pretty weird to me - is that really what's happening? (although
> I must admit that I don't understand how the process could block SIGKILL
> anyway).

A process can't block SIGKILL, however if the process is waiting on
a system call, the SIGKILL will not be delivered until the process
goes back to user space...

-- 
______________________________________________________________________
Michael Sweet, Easy Software Products           mike at easysw dot com
Internet Printing and Document Software          http://www.easysw.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Thu, Aug 25, 2005 at 05:39:01PM +0200, Yun Zheng Hu wrote:
> If we dont set a timeout, the apr_connect function will use a blocking
> socket connect but does not timeout if the user sent an interrupt (the
> bug).
> If we do set a timeout it will be a nonblocking connect, and resort to
> apr_poll for proper timeout (see apr/network_io/unix/sockets.c).

It sounds as though you're saying that the Mac OS X (blocking) version of
connect() will hang the process and make it unkillable if you send it a
SIGINT while it's waiting for a connection to complete.

That seems pretty weird to me - is that really what's happening? (although
I must admit that I don't understand how the process could block SIGKILL
anyway).

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by kf...@collab.net.
Yun Zheng Hu <yu...@gmail.com> writes:
> But here's how to reproduce the hanging process bug:
> 
> Do the same command, but interrupt it with a ctrl-c (one or multiple
> times, doesnt matter):
> $ time svn co svn://svn.edgewall.com
> ^C
> 
> This will result in a hanging process, at least on OS X. 
> Its also not killable with a 'kill -9' after you interrupt it, because
> the signal_handler is setup to ignore future interrupts (SIG_IGN).
> Ofcourse you could not interrupt it and let it timeout properly, but
> usually the user knows he made a typo of some kind and want to kill
> the process asap.

Oh, I know, I reproduced it myself on Debian GNU/Linux too.

> The bug might also be related to apr_signal, but havent really tested yet.
> 
> By setting a timeout on the socket the process will become killable
> again, the timeout can be 10 seconds or 4 minutes or whatever, as long
> as we have a timeout.
> If we dont set a timeout, the apr_connect function will use a blocking
> socket connect but does not timeout if the user sent an interrupt (the
> bug).
> If we do set a timeout it will be a nonblocking connect, and resort to
> apr_poll for proper timeout (see apr/network_io/unix/sockets.c).

In other words, it *becomes* interruptable *if* we set a timeout, on
Mac OS/X?  That's interesting, and unexpected!

I tried the patch below (same as my earlier patch, but with a
different timeout length, and better comments), and it did not change
the interrupt behavior on Debian GNU/Linux for me.  However, at least
it timed out much sooner.

I could easily believe that Mac OS/X and Linux behave differently in
this circumstance.  I'm tempted to apply the patch anyway, therefore,
but first would like you to confirm the above summary of Mac OS/X
behavior.

Thanks,
-Karl

-- 
www.collab.net  <>  CollabNet  |  Distributed Development On Demand

[[[
Fix a bug whereby connecting to a non-existent repository can cause
apr_socket_connect() to hang non-interruptably.  To reproduce, try
'svn co svn://svn.edgewall.com'.

Patch by: Yun Zheng Hu <yu...@gmail.com>
(Tweaked by me to use apr_socket_set_timeout instead of apr_setsocketopt.)

* subversion/libsvn_ra_svn/client.c
  (make_connection): Set socket timeout to 30 seconds.
]]]

Index: subversion/libsvn_ra_svn/client.c
===================================================================
--- subversion/libsvn_ra_svn/client.c	(revision 15904)
+++ subversion/libsvn_ra_svn/client.c	(working copy)
@@ -136,6 +136,22 @@
   if (status)
     return svn_error_wrap_apr(status, _("Can't create socket"));
 
+  /* Setting a timeout does more than just cause the attempt to fail
+     automatically after 30 seconds without success; it makes
+     apr_socket_connect() interruptable, which it otherwise would not
+     be, at least on Mac OS X.  See this message and thread for more:
+
+     http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=104587
+     From: Yun Zheng Hu <yu...@gmail.com>
+     To: dev@subversion.tigris.org
+     Subject: [PATCH] fix non interruptable hang in svn client when connecting
+     Date: Thu, 25 Aug 2005 00:40:07 +0200
+     Message-ID: <df...@mail.gmail.com>
+  */
+  status = apr_socket_timeout_set(*sock, 30 * APR_USEC_PER_SEC);
+  if (status)
+    return svn_error_wrap_apr(status, _("Can't set timeout"));
+
   status = apr_socket_connect(*sock, sa);
   if (status)
     return svn_error_wrap_apr(status, _("Can't connect to host '%s'"),

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by Yun Zheng Hu <yu...@gmail.com>.
Sorry for the brackets and not using an unified diff, I think I
misinterpreted the patch submission guidelines a bit :) wont happen
again.

As for the timeout, its seems OS dependant, its even shorter (~1
minute) on OS X:

$ time svn co svn://svn.edgewall.com
svn: Can't connect to host 'svn.edgewall.com': Operation timed out

real    1m14.639s
user    0m0.004s
sys     0m0.011s

$ time svn co svn://svn.edgewall.com
svn: Can't connect to host 'svn.edgewall.com': Operation timed out

real    1m14.908s
user    0m0.004s
sys     0m0.010s

But here's how to reproduce the hanging process bug:

Do the same command, but interrupt it with a ctrl-c (one or multiple
times, doesnt matter):
$ time svn co svn://svn.edgewall.com
^C

This will result in a hanging process, at least on OS X. 
Its also not killable with a 'kill -9' after you interrupt it, because
the signal_handler is setup to ignore future interrupts (SIG_IGN).
Ofcourse you could not interrupt it and let it timeout properly, but
usually the user knows he made a typo of some kind and want to kill
the process asap.

The bug might also be related to apr_signal, but havent really tested yet.

By setting a timeout on the socket the process will become killable
again, the timeout can be 10 seconds or 4 minutes or whatever, as long
as we have a timeout.
If we dont set a timeout, the apr_connect function will use a blocking
socket connect but does not timeout if the user sent an interrupt (the
bug).
If we do set a timeout it will be a nonblocking connect, and resort to
apr_poll for proper timeout (see apr/network_io/unix/sockets.c).

On 24 Aug 2005 22:24:03 -0500, kfogel@collab.net <kf...@collab.net> wrote:
> Yun Zheng Hu <yu...@gmail.com> writes:
> > [[[
> > When you connect to a non existing repository, but the host exists,
> > the apr_socket_connect will loop forever somehow, and is NOT
> > interruptable.
> > On OS X this will result in a inresponsive system, which is very bad.
> >
> > How to reproduce:
> >  svn co svn://svn.edgewall.com
> >
> > There is no svn server running on that port, so try a CTRL-C to
> > cancel, but its not possible.. it will loop forever. process is not
> > killable on OS X either.
> >
> > I found out that somebody else was having this problem too, but could
> > not find a bug report and neither the fix. The user that was having
> > this same bug can be read on:
> >
> > http://svn.haxx.se/users/archive-2005-07/1027.shtml
> >
> > The following patch will set a timeout of 10 seconds on apr_socket_connect.
> > So if it cant connect it will timeout safely, 10 seconds should be
> > more than enough I think.
> > The patch will also include the port number in the error message, it
> > might be useful or not..
> > ]]]
> 
> Thanks for the patch.  I have some technical comments below.  Also,
> there's no need to use the "[[[ ... ]]]" convention unless you are
> actually putting a log message inside the braces.  (You put the main
> body of your message there instead.)
> 
> > The patch is very simple, open up:
> > subversion/libsvn_ra_svn/client.c
> >
> > search for: "apr_socket_connect", and change the code to:
> >
> >   status = apr_setsocketopt(*sock, APR_SO_TIMEOUT, 10 * APR_USEC_PER_SEC);
> >   if (status)
> >     return svn_error_wrap_apr(status, _("Can't set timeout"));
> >
> >   status = apr_socket_connect(*sock, sa);
> >   if (status)
> >     return svn_error_wrap_apr(status, _("Can't connect to host '%s' on
> > port %u"),
> >                               hostname, port);
> >
> 
> I think we shouldn't mix the port-number-in-error-message change with
> the timeout change, as they are two unrelated things.
> 
> Also, do you know how to generate a real patch, that is, something
> that can be applied using the 'patch' program?  Just make your
> modifications and run 'svn diff'.  The result will be a patch, in the
> format expected by all open source projects.
> 
> Now the technical notes:
> 
> apr_setsocketopt() is deprecated, and recommends the use of
> apr_socket_set_opt() instead.  However, if you look at the
> documentation for apr_socket_set_opt(), the APR_SO_TIMEOUT option is
> not listed as one of the valid options.  Instead, one should use
> apr_socket_set_timeout()! :-)
> 
> So that's what I've done in the patch below.
> 
> However, I'm not sure we should apply this patch; at least, I'd like
> other developer's thoughts.  There already seems to be a default
> timeout, it's just long.  It's around 3 minutes, according to my
> tests:
> 
>   $ time svn co svn://svn.edgewall.com
>   subversion/libsvn_ra_svn/client.c:141: (apr_err=110)
>   svn: Can't connect to host 'svn.edgewall.com': Connection timed out
> 
>   real    3m9.548s
>   user    0m0.020s
>   sys     0m0.008s
> 
>   $ time svn co svn://svn.edgewall.com
>   subversion/libsvn_ra_svn/client.c:141: (apr_err=110)
>   svn: Can't connect to host 'svn.edgewall.com': Connection timed out
> 
>   real    3m9.056s
>   user    0m0.022s
>   sys     0m0.005s
> 
>   $
> 
> Now, if we lower that timeout, we're rejecting the default chosen by
> APR, without doing anything to solve the root of the problem, which is
> the lack of interruptability.  On the other hand, the lack of
> interruptability is sort of our fault, since we handle signals
> ourselves.  We have to run our check-cancellation function to actually
> do anything with a signal, and that means we lose whenever we go into
> an APR function like apr_socket_connect() :-(.
> 
> I'm not sure what the right answer is here.  Should we lower the
> socket timeout to 10 seconds, since we have made apr_socket_connect()
> uninterruptable?  Anyone have any thoughts?
> 
> Here's the new patch, anyway:
> 
> [[[
> Fix a bug whereby connecting to a non-existent repository can cause
> apr_socket_connect() to hang non-interruptably.  To reproduce, try
> 'svn co svn://svn.edgewall.com'.
> 
> Patch by: Yun Zheng Hu <yu...@gmail.com>
> (Tweaked by me to use apr_socket_set_timeout instead of apr_setsocketopt.)
> 
> * subversion/libsvn_ra_svn/client.c
>  (make_connection): Set socket timeout to 10 seconds.
> 
> Yun Zheng Hu's original message:
> 
>   http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=104587
>   From: Yun Zheng Hu <yu...@gmail.com>
>   To: dev@subversion.tigris.org
>   Subject: [PATCH] fix non interruptable hang in svn client when connecting
>   Date: Thu, 25 Aug 2005 00:40:07 +0200
>   Message-ID: <df...@mail.gmail.com>
> ]]]
> 
> Index: subversion/libsvn_ra_svn/client.c
> ===================================================================
> --- subversion/libsvn_ra_svn/client.c   (revision 15898)
> +++ subversion/libsvn_ra_svn/client.c   (working copy)
> @@ -136,6 +136,10 @@
>   if (status)
>     return svn_error_wrap_apr(status, _("Can't create socket"));
> 
> +  status = apr_socket_timeout_set(*sock, 10 * APR_USEC_PER_SEC);
> +  if (status)
> +    return svn_error_wrap_apr(status, _("Can't set timeout"));
> +
>   status = apr_socket_connect(*sock, sa);
>   if (status)
>     return svn_error_wrap_apr(status, _("Can't connect to host '%s'"),
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Fri, Aug 26, 2005 at 12:50:51AM +0100, Philip Martin wrote:
> BSDs traditionally avoid EINTR and automatically restart system calls
> interrupted by a signal (and glibc on Linux has SA_RESTART to control
> the behaviour).  So APR might never see EINTR as the OS could be
> automatically restarting system calls with an appropriately reduced
> timeout.

Ah, yes. And on Linux, that's exactly what's happening. I was getting
confused by the output of strace (connect() 'returns' ERESTARTSYS, but
it doesn't really make it back to the client because glibc restarts the call,
hence (somehow, somewhere) the reduced timeout).

It's possible that the same thing is happening on Darwin (though I can't test
at the moment, and I really don't want to hang the machine _again_), but I
thought I remembered ktrace indicating an EINTR return from connect().

In any case, I sent a bugreport to the darwin-kernel mailing list. I'll post
back here if I get anything informative from that.

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by Philip Martin <ph...@codematters.co.uk>.
Malcolm Rowe <ma...@farside.org.uk> writes:

> There is still one thing that I would have thought would be a problem.
> APR restarts any connect() calls that fail with EINTR automatically

BSDs traditionally avoid EINTR and automatically restart system calls
interrupted by a signal (and glibc on Linux has SA_RESTART to control
the behaviour).  So APR might never see EINTR as the OS could be
automatically restarting system calls with an appropriately reduced
timeout.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by kf...@collab.net.
Malcolm Rowe <ma...@farside.org.uk> writes:
> Strange. On Mac OS 10.4.2 (Darwin 8.2.1) I can reproduce the problem.
> Note: the original example was an svn:// URL, not an http:// one. I doubt
> it matters, though.

Sorry -- there was a cut and paste problem (long story), but in fact
we both ran with "svn://", not "http://".



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Thu, Aug 25, 2005 at 04:07:12PM -0500, kfogel@collab.net wrote:
> Phillip Susi <ps...@cfl.rr.com> writes:
> > 3) What the heck is wrong with apr_connect?  The call should return
> > almost immediately assuming the remote host is alive but has nothing
> > listening on that port.  If the remote host is not reachable then the
> > connect call should return after some reasonable timeout period, which
> > is usually between 15 and 60 seconds.  In the former case it should
> > indicate that the connection was refused, and in the latter, that the
> > connection timed out.  In both cases, SIGINT/ctrl-c should interrupt
> > and kill the process, and obviously SIGKILL/TerminateProcess() most
> > definately should.
> Another thing that concerns me is that Ben Collins-Sussman, on his Mac
> notebook (OS X 10.4, Darwin kernel 8.2.0), couldn't reproduce most of
> this bug.  He ran
> 
>    $ svn co http://svn.edgewall.com
> 
> It hung, of course.  He tried Ctrl-C several times, and it didn't
> respond.  But when we finally did "kill -9 PID", that killed it.
> There seem to be no ill effects on the network stack from that.

Strange. On Mac OS 10.4.2 (Darwin 8.2.1) I can reproduce the problem.
Note: the original example was an svn:// URL, not an http:// one. I doubt
it matters, though.

I assume that the server isn't responding at all, even with an ICMP message,
so the OS-default timeout applies (~75 seconds on Darwin, ~3 mins on Linux).

There is still one thing that I would have thought would be a problem.
APR restarts any connect() calls that fail with EINTR automatically
(see apr_socket_connect() in apr/network_io/unix/sockets.c), and so I would
have expected that sending a SIGINT during connection would actually
result in a _longer_ time, as APR would restart the connection with a new
timeout, and we can't look at the cancellation flag until after
apr_socket_connect() returns, if at all).

However, on my Linux box, this doesn't seem to happen. I still can't cancel
a pending connect() (APR just restarts it and ignores all future SIGINTs),
but the amount of time spent is exactly the same whether I the first
connect() is interrupted or not. Strange.

I'm guessing it would be too hard to change things so that we don't rely on
APR to restart the connect() (and thus could look at the cancellation flag
before we retry)?

>   If it really has the effect that I thought it had, then it
> would be good, but so far that effect seems to only exist on Yun
> Zheng's computer.  Hmmm.

(if the effect is the system hanging, I get that too. I've not tested r15909.)

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by kf...@collab.net.
Phillip Susi <ps...@cfl.rr.com> writes:
> I don't think that this workaround is really appropriate, as it
> doesn't directly take on the cause of the problem.  The problems as I
> see them are:
> 
> 1) OSX apparently has a bug in its kernel network stack
> 
> 2) The svn client on win32 is explicitly disabling the console ctrl-c
> handler so it can't be killed that way.  Terminating the process in
> task manager works fine.
> 
> 3) What the heck is wrong with apr_connect?  The call should return
> almost immediately assuming the remote host is alive but has nothing
> listening on that port.  If the remote host is not reachable then the
> connect call should return after some reasonable timeout period, which
> is usually between 15 and 60 seconds.  In the former case it should
> indicate that the connection was refused, and in the latter, that the
> connection timed out.  In both cases, SIGINT/ctrl-c should interrupt
> and kill the process, and obviously SIGKILL/TerminateProcess() most
> definately should.

Another thing that concerns me is that Ben Collins-Sussman, on his Mac
notebook (OS X 10.4, Darwin kernel 8.2.0), couldn't reproduce most of
this bug.  He ran

   $ svn co http://svn.edgewall.com

It hung, of course.  He tried Ctrl-C several times, and it didn't
respond.  But when we finally did "kill -9 PID", that killed it.
There seem to be no ill effects on the network stack from that.

So yeah, I'm really wondering if r15909 was a good idea.  I don't want
to hastily revert it, but it's not clear that it's doing a whole lot
of good.  If it really has the effect that I thought it had, then it
would be good, but so far that effect seems to only exist on Yun
Zheng's computer.  Hmmm.

-Karl

-- 
www.collab.net  <>  CollabNet  |  Distributed Development On Demand

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by Phillip Susi <ps...@cfl.rr.com>.
I don't think that this workaround is really appropriate, as it doesn't 
directly take on the cause of the problem.  The problems as I see them are:

1) OSX apparently has a bug in its kernel network stack

2) The svn client on win32 is explicitly disabling the console ctrl-c 
handler so it can't be killed that way.  Terminating the process in task 
manager works fine.

3) What the heck is wrong with apr_connect?  The call should return 
almost immediately assuming the remote host is alive but has nothing 
listening on that port.  If the remote host is not reachable then the 
connect call should return after some reasonable timeout period, which 
is usually between 15 and 60 seconds.  In the former case it should 
indicate that the connection was refused, and in the latter, that the 
connection timed out.  In both cases, SIGINT/ctrl-c should interrupt and 
kill the process, and obviously SIGKILL/TerminateProcess() most 
definately should.

kfogel@collab.net wrote:
> Okay, I understand now, thanks to you and Michael Sweet.
> 
> Note that I can successfully SIGKILL on Linux, which is expected.  I
> suspect this kernel bug in OS X will be fixed eventually.
> 
> However, I think I'm going to commit the latest patch.  It seems like
> it is an improvement for some people, and at least not harmful for
> everyone else.
> 
> Done in r15909.
> 
> Thanks!
> 
> -Karl
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by kf...@collab.net.
Yun Zheng Hu <yu...@gmail.com> writes:
> Yes I can confirm that Karl.
> But to be sure if we undestand each other correctly, by it *becomes*
> interruptable again *when* setting a timeout, I mean it becomes
> killable (with a SIGKILL) again. so that means firing up another shell
> and do "kill -9 <pid>".
> 
> And if we dont set a timeout it will result as Malcolm Rowe said: a
> blocking connect which is not killable on OS X when you send a SIGINT
> while its connecting.
> 
> I have made an proof of concept code for testing it on OS X, which
> might be interesting to other OS X users/developers who want to test
> it:

Okay, I understand now, thanks to you and Michael Sweet.

Note that I can successfully SIGKILL on Linux, which is expected.  I
suspect this kernel bug in OS X will be fixed eventually.

However, I think I'm going to commit the latest patch.  It seems like
it is an improvement for some people, and at least not harmful for
everyone else.

Done in r15909.

Thanks!

-Karl

-- 
www.collab.net  <>  CollabNet  |  Distributed Development On Demand

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Thu, Aug 25, 2005 at 03:22:39PM -0400, David James wrote:
> >   server.sin_addr.s_addr = inet_addr("www.google.com");
> It is probably not a good idea to run this code, because it sends a
> potentially infinite number of requests to www.google.com. May I
> suggest targeting 127.0.0.1 instead, for test scripts?

Actually, it won't, as inet_addr can't do DNS lookups, only convert IP
addresses, so it returns -1 (255.255.255.255) instead. I'm not sure what
TCP/IP connections to the broadcast address are supposed to do.

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by Yun Zheng Hu <yu...@gmail.com>.
gotcha, I actually copied the network code from some example, guess I
had to double check it for errors.

But it was more of a proof of concept code to hang the system on OS X,
which shouldnt happen in the first place.

On 8/25/05, Malcolm Rowe <ma...@farside.org.uk> wrote:
> On Thu, Aug 25, 2005 at 09:36:31PM +0200, Yun Zheng Hu wrote:
> >  inet_addr("localhost");
> 
> No, that still doesn't work.
> 
> You need something like inet_addr("72.36.163.131"), and indeed, when you
> try it, you'll see it works fine (it times out, and is interruptible).
> 
> So I won't say that this is completely unrelated, because it generates the
> same symptoms, but it's not a good test case.
> 
> Regards,
> Malcolm
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Thu, Aug 25, 2005 at 09:36:31PM +0200, Yun Zheng Hu wrote:
>  inet_addr("localhost");

No, that still doesn't work.

You need something like inet_addr("72.36.163.131"), and indeed, when you
try it, you'll see it works fine (it times out, and is interruptible).

So I won't say that this is completely unrelated, because it generates the
same symptoms, but it's not a good test case.

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by Yun Zheng Hu <yu...@gmail.com>.
You are absolutely correct.

please replace:
 inet_addr("www.google.com");
with:
 inet_addr("localhost");


On 8/25/05, David James <ja...@gmail.com> wrote:
> On 8/25/05, Yun Zheng Hu <yu...@gmail.com> wrote:
> >   server.sin_addr.s_addr = inet_addr("www.google.com");
> >
> >   printf("Try killing this process..\n");
> >
> >   do {
> >     rc = connect(sock,
> >                  (const struct sockaddr *)&server,
> >                  sizeof(server));
> >   } while (rc == -1);
> Yun:
> 
> It is probably not a good idea to run this code, because it sends a
> potentially infinite number of requests to www.google.com. May I
> suggest targeting 127.0.0.1 instead, for test scripts?
> 
> Cheers,
> 
> David
> 
> 
> --
> David James -- http://www.cs.toronto.edu/~james
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by David James <ja...@gmail.com>.
On 8/25/05, Yun Zheng Hu <yu...@gmail.com> wrote:
>   server.sin_addr.s_addr = inet_addr("www.google.com");
> 
>   printf("Try killing this process..\n");
> 
>   do {
>     rc = connect(sock,
>                  (const struct sockaddr *)&server,
>                  sizeof(server));
>   } while (rc == -1);
Yun:

It is probably not a good idea to run this code, because it sends a
potentially infinite number of requests to www.google.com. May I
suggest targeting 127.0.0.1 instead, for test scripts?

Cheers,

David


-- 
David James -- http://www.cs.toronto.edu/~james

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by Yun Zheng Hu <yu...@gmail.com>.
Yes I can confirm that Karl.
But to be sure if we undestand each other correctly, by it *becomes*
interruptable again *when* setting a timeout, I mean it becomes
killable (with a SIGKILL) again. so that means firing up another shell
and do "kill -9 <pid>".

And if we dont set a timeout it will result as Malcolm Rowe said: a
blocking connect which is not killable on OS X when you send a SIGINT
while its connecting.

I have made an proof of concept code for testing it on OS X, which
might be interesting to other OS X users/developers who want to test
it:

----
/* 
Unkillable process proof of concept code. Tested on Mac OS X 10.4.2. 
WARNING: If you run this you will most likely have to manually force a
hard reboot.
*/ 
#include <stdlib.h> 
#include <sys/types.h> 
#include <sys/socket.h> 
#include <netinet/in.h> 

#include <strings.h> 
#include <stdio.h> 

int main(void) 
{ 
  struct sockaddr_in server; 
  int sock, rc; 

  sock = socket( AF_INET, SOCK_STREAM, 0 ); 

  bzero(&server, sizeof(server)); 
  server.sin_family = AF_INET; 
  server.sin_port = htons(12345); 
  server.sin_addr.s_addr = inet_addr("www.google.com"); 

  printf("Try killing this process..\n"); 
  
  do { 
    rc = connect(sock, 
                 (const struct sockaddr *)&server, 
                 sizeof(server)); 
  } while (rc == -1); 

  return 1; 
}
-----

On 24 Aug 2005 22:24:03 -0500, kfogel@collab.net <kf...@collab.net> wrote:
> Yun Zheng Hu <yu...@gmail.com> writes:
> > [[[
> > When you connect to a non existing repository, but the host exists,
> > the apr_socket_connect will loop forever somehow, and is NOT
> > interruptable.
> > On OS X this will result in a inresponsive system, which is very bad.
> >
> > How to reproduce:
> >  svn co svn://svn.edgewall.com
> >
> > There is no svn server running on that port, so try a CTRL-C to
> > cancel, but its not possible.. it will loop forever. process is not
> > killable on OS X either.
> >
> > I found out that somebody else was having this problem too, but could
> > not find a bug report and neither the fix. The user that was having
> > this same bug can be read on:
> >
> > http://svn.haxx.se/users/archive-2005-07/1027.shtml
> >
> > The following patch will set a timeout of 10 seconds on apr_socket_connect.
> > So if it cant connect it will timeout safely, 10 seconds should be
> > more than enough I think.
> > The patch will also include the port number in the error message, it
> > might be useful or not..
> > ]]]
> 
> Thanks for the patch.  I have some technical comments below.  Also,
> there's no need to use the "[[[ ... ]]]" convention unless you are
> actually putting a log message inside the braces.  (You put the main
> body of your message there instead.)
> 
> > The patch is very simple, open up:
> > subversion/libsvn_ra_svn/client.c
> >
> > search for: "apr_socket_connect", and change the code to:
> >
> >   status = apr_setsocketopt(*sock, APR_SO_TIMEOUT, 10 * APR_USEC_PER_SEC);
> >   if (status)
> >     return svn_error_wrap_apr(status, _("Can't set timeout"));
> >
> >   status = apr_socket_connect(*sock, sa);
> >   if (status)
> >     return svn_error_wrap_apr(status, _("Can't connect to host '%s' on
> > port %u"),
> >                               hostname, port);
> >
> 
> I think we shouldn't mix the port-number-in-error-message change with
> the timeout change, as they are two unrelated things.
> 
> Also, do you know how to generate a real patch, that is, something
> that can be applied using the 'patch' program?  Just make your
> modifications and run 'svn diff'.  The result will be a patch, in the
> format expected by all open source projects.
> 
> Now the technical notes:
> 
> apr_setsocketopt() is deprecated, and recommends the use of
> apr_socket_set_opt() instead.  However, if you look at the
> documentation for apr_socket_set_opt(), the APR_SO_TIMEOUT option is
> not listed as one of the valid options.  Instead, one should use
> apr_socket_set_timeout()! :-)
> 
> So that's what I've done in the patch below.
> 
> However, I'm not sure we should apply this patch; at least, I'd like
> other developer's thoughts.  There already seems to be a default
> timeout, it's just long.  It's around 3 minutes, according to my
> tests:
> 
>    $ time svn co svn://svn.edgewall.com
>    subversion/libsvn_ra_svn/client.c:141: (apr_err=110)
>    svn: Can't connect to host 'svn.edgewall.com': Connection timed out
> 
>    real    3m9.548s
>    user    0m0.020s
>    sys     0m0.008s
> 
>    $ time svn co svn://svn.edgewall.com
>    subversion/libsvn_ra_svn/client.c:141: (apr_err=110)
>    svn: Can't connect to host 'svn.edgewall.com': Connection timed out
> 
>    real    3m9.056s
>    user    0m0.022s
>    sys     0m0.005s
> 
>    $
> 
> Now, if we lower that timeout, we're rejecting the default chosen by
> APR, without doing anything to solve the root of the problem, which is
> the lack of interruptability.  On the other hand, the lack of
> interruptability is sort of our fault, since we handle signals
> ourselves.  We have to run our check-cancellation function to actually
> do anything with a signal, and that means we lose whenever we go into
> an APR function like apr_socket_connect() :-(.
> 
> I'm not sure what the right answer is here.  Should we lower the
> socket timeout to 10 seconds, since we have made apr_socket_connect()
> uninterruptable?  Anyone have any thoughts?
> 
> Here's the new patch, anyway:
> 
> [[[
> Fix a bug whereby connecting to a non-existent repository can cause
> apr_socket_connect() to hang non-interruptably.  To reproduce, try
> 'svn co svn://svn.edgewall.com'.
> 
> Patch by: Yun Zheng Hu <yu...@gmail.com>
> (Tweaked by me to use apr_socket_set_timeout instead of apr_setsocketopt.)
> 
> * subversion/libsvn_ra_svn/client.c
>   (make_connection): Set socket timeout to 10 seconds.
> 
> Yun Zheng Hu's original message:
> 
>    http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=104587
>    From: Yun Zheng Hu <yu...@gmail.com>
>    To: dev@subversion.tigris.org
>    Subject: [PATCH] fix non interruptable hang in svn client when connecting
>    Date: Thu, 25 Aug 2005 00:40:07 +0200
>    Message-ID: <df...@mail.gmail.com>
> ]]]
> 
> Index: subversion/libsvn_ra_svn/client.c
> ===================================================================
> --- subversion/libsvn_ra_svn/client.c   (revision 15898)
> +++ subversion/libsvn_ra_svn/client.c   (working copy)
> @@ -136,6 +136,10 @@
>    if (status)
>      return svn_error_wrap_apr(status, _("Can't create socket"));
> 
> +  status = apr_socket_timeout_set(*sock, 10 * APR_USEC_PER_SEC);
> +  if (status)
> +    return svn_error_wrap_apr(status, _("Can't set timeout"));
> +
>    status = apr_socket_connect(*sock, sa);
>    if (status)
>      return svn_error_wrap_apr(status, _("Can't connect to host '%s'"),
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: [PATCH] fix non interruptable hang in svn client when connecting

Posted by kf...@collab.net.
Yun Zheng Hu <yu...@gmail.com> writes:
> [[[
> When you connect to a non existing repository, but the host exists,
> the apr_socket_connect will loop forever somehow, and is NOT
> interruptable.
> On OS X this will result in a inresponsive system, which is very bad. 
> 
> How to reproduce:
>  svn co svn://svn.edgewall.com
> 
> There is no svn server running on that port, so try a CTRL-C to
> cancel, but its not possible.. it will loop forever. process is not
> killable on OS X either.
> 
> I found out that somebody else was having this problem too, but could
> not find a bug report and neither the fix. The user that was having
> this same bug can be read on:
> 
> http://svn.haxx.se/users/archive-2005-07/1027.shtml
> 
> The following patch will set a timeout of 10 seconds on apr_socket_connect.
> So if it cant connect it will timeout safely, 10 seconds should be
> more than enough I think.
> The patch will also include the port number in the error message, it
> might be useful or not..
> ]]]

Thanks for the patch.  I have some technical comments below.  Also,
there's no need to use the "[[[ ... ]]]" convention unless you are
actually putting a log message inside the braces.  (You put the main
body of your message there instead.)

> The patch is very simple, open up:
> subversion/libsvn_ra_svn/client.c
> 
> search for: "apr_socket_connect", and change the code to:
> 
>   status = apr_setsocketopt(*sock, APR_SO_TIMEOUT, 10 * APR_USEC_PER_SEC);
>   if (status)
>     return svn_error_wrap_apr(status, _("Can't set timeout"));
> 
>   status = apr_socket_connect(*sock, sa);
>   if (status)
>     return svn_error_wrap_apr(status, _("Can't connect to host '%s' on
> port %u"),
>                               hostname, port);
> 

I think we shouldn't mix the port-number-in-error-message change with
the timeout change, as they are two unrelated things.

Also, do you know how to generate a real patch, that is, something
that can be applied using the 'patch' program?  Just make your
modifications and run 'svn diff'.  The result will be a patch, in the
format expected by all open source projects.

Now the technical notes:

apr_setsocketopt() is deprecated, and recommends the use of
apr_socket_set_opt() instead.  However, if you look at the
documentation for apr_socket_set_opt(), the APR_SO_TIMEOUT option is
not listed as one of the valid options.  Instead, one should use
apr_socket_set_timeout()! :-)

So that's what I've done in the patch below.

However, I'm not sure we should apply this patch; at least, I'd like
other developer's thoughts.  There already seems to be a default
timeout, it's just long.  It's around 3 minutes, according to my
tests:

   $ time svn co svn://svn.edgewall.com
   subversion/libsvn_ra_svn/client.c:141: (apr_err=110)
   svn: Can't connect to host 'svn.edgewall.com': Connection timed out
   
   real    3m9.548s
   user    0m0.020s
   sys     0m0.008s

   $ time svn co svn://svn.edgewall.com
   subversion/libsvn_ra_svn/client.c:141: (apr_err=110)
   svn: Can't connect to host 'svn.edgewall.com': Connection timed out
   
   real    3m9.056s
   user    0m0.022s
   sys     0m0.005s

   $ 

Now, if we lower that timeout, we're rejecting the default chosen by
APR, without doing anything to solve the root of the problem, which is
the lack of interruptability.  On the other hand, the lack of
interruptability is sort of our fault, since we handle signals
ourselves.  We have to run our check-cancellation function to actually
do anything with a signal, and that means we lose whenever we go into
an APR function like apr_socket_connect() :-(.

I'm not sure what the right answer is here.  Should we lower the
socket timeout to 10 seconds, since we have made apr_socket_connect()
uninterruptable?  Anyone have any thoughts?

Here's the new patch, anyway:

[[[
Fix a bug whereby connecting to a non-existent repository can cause
apr_socket_connect() to hang non-interruptably.  To reproduce, try
'svn co svn://svn.edgewall.com'.

Patch by: Yun Zheng Hu <yu...@gmail.com>
(Tweaked by me to use apr_socket_set_timeout instead of apr_setsocketopt.)

* subversion/libsvn_ra_svn/client.c
  (make_connection): Set socket timeout to 10 seconds.

Yun Zheng Hu's original message:

   http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=104587
   From: Yun Zheng Hu <yu...@gmail.com>
   To: dev@subversion.tigris.org
   Subject: [PATCH] fix non interruptable hang in svn client when connecting
   Date: Thu, 25 Aug 2005 00:40:07 +0200
   Message-ID: <df...@mail.gmail.com>
]]]

Index: subversion/libsvn_ra_svn/client.c
===================================================================
--- subversion/libsvn_ra_svn/client.c   (revision 15898)
+++ subversion/libsvn_ra_svn/client.c   (working copy)
@@ -136,6 +136,10 @@
   if (status)
     return svn_error_wrap_apr(status, _("Can't create socket"));
 
+  status = apr_socket_timeout_set(*sock, 10 * APR_USEC_PER_SEC);
+  if (status)
+    return svn_error_wrap_apr(status, _("Can't set timeout"));
+
   status = apr_socket_connect(*sock, sa);
   if (status)
     return svn_error_wrap_apr(status, _("Can't connect to host '%s'"),


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org