You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by Kathey Marsden <km...@Sourcery.Org> on 2004/10/09 01:41:09 UTC

Help detecting client disconnects for network server

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello

I am hoping there is a network programming expert out there who can
help me.	

I am working on an issue with Network Server where the server does not
clean up connection threads properly for disconnected clients.  I get an
 IOException if the client program is killed or aborted with <ctrl> c
but if the network cable is just unplugged or the machine turned off, we
don't detect the disconnect properly.


So, if I do this on the client machine

create table t (i int);
autocommit off;
lock table t in exclusive mode;

Then disconnect the cable, the server will continue to block on the
inputStream.read() and the connection will continue to hold the lock so
no one else can select from the table.


Reading a little about this it seems the only way to really detect if
the socket is active is to attempt a write. This of course is not an
option since it will mess up the drda protocol.  Other things I have
looked at are Socket.setSoTimeout, which seems no good because the
client might in fact just be sitting there doing nothing for a long time.

Any ideas?

Thanks

Kathey


Here's the trace where network server blocks.

"DRDAConnThread_3" prio=5 tid=0x0ADCD560 nid=0x7f8 runnable
[b7bf000..b7bfd88]
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at org.apache.derby.impl.drda.DDMReader.fill(DDMReader.java)
        at
org.apache.derby.impl.drda.DDMReader.ensureALayerDataInBuffer(DDMReader.java)
        at
org.apache.derby.impl.drda.DDMReader.readDssHeader(DDMReader.java)
        at
org.apache.derby.impl.drda.DRDAConnThread.processCommands(DRDAConnThread.java)
        at
org.apache.derby.impl.drda.DRDAConnThread.run(DRDAConnThread.java)








-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBZyWVG0h36bFmkocRAsySAKCg998gIH7bHqAUTx47aFNH+iBYZgCglThJ
XrlMH/9M+LLYgZKIFTD1e0M=
=Up2F
-----END PGP SIGNATURE-----

Re: Help detecting client disconnects for network server

Posted by Jan Hlavatý <hl...@code.cz>.

Kathey Marsden wrote:
> I am working on an issue with Network Server where the server does not
> clean up connection threads properly for disconnected clients.  I get an
>  IOException if the client program is killed or aborted with <ctrl> c
> but if the network cable is just unplugged or the machine turned off, we
> don't detect the disconnect properly.

I have been looking at the DRDAv3 protocol specs to see if there is any way to implement
protocol level pings that could be used for client disconnect detection.
It looks to me the only way to implement a ping is a client initiated one,
using the EXCSAT command. This could be used, but requires cooperation from the client.
See chapter 11.2.4 of the spec vol. 1.
So it would basically need client to automatically ping the server from time to time
during inactivity. Lack of these pings could be interpreted by server as client crash.
But that would not work very well with third party DRDA clients that have no idea
about the ping we want to use, nor can we modify JDBC driver from IBM to do that ;(

Looks like the SO_KEEPALIVE mechanism will have to do.

Jan

[PATCH] Enable SO_KEEPALIVE on client TCP connections

Posted by Jan Hlavatý <hl...@code.cz>.

> In any case, network server should certainly enable SO_KEEPALIVE on client connections.

Here is a patch that does just that. ;)

Jan

Re: Help detecting client disconnects for network server

Posted by Jan Hlavatý <hl...@code.cz>.

> SO_KEEPALIVE works, even for blocking in reads.
> I have tested it.

Tried both windows XP and linux (FC2), configured for 30 second keepalive interval.
Both throw SocketException as specified after the keepalive probes fail, when I'm in blocking read().

Bad thing is, without system wide configuration change, timeout defaults to 2 hours.
So it looks like a good idea to configure shorter interval on server machine.

In any case, network server should certainly enable SO_KEEPALIVE on client connections.

Jan

Re: [VOTE RESULT] Re: Help detecting client disconnects for network server

Posted by Kathey Marsden <km...@Sourcery.Org>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kathey Marsden wrote:
> Samuel Andrew McIntyre wrote:
>
>>>So, I'm still not sure that I like having keepalive set by default
>>>without a way to turn it off.
>
>
> OK, so, given the input from everyone I submit the following solution to
> vote:
>
>
> 1) Have keepAlive on by default. It seems important not only for locks
> but for potential network server bloat due to connections not getting
> cleaned up.
> 	
3 +1's
>
> 2) Add a property derby.drda.keepAlive={true|false} (defaults to true as
> described above).  There seems to be a need to be able to turn keepAlive
> off in some cases.
> 	
>
2  +1's  and a strong argument for it without an actual vote.

We didn't hear any arguments to Dan's last question about why we
shouldn't have an option to turn keepalive off.  So, unless anyone posts
 any objections, I will submit 1 and 2 on  Monday.

> 3) Add property derby.drda.connSoTimeout=<milliseconds> (defaults to 0,
> infinite) to provide the ability to have connections timeout after a
> period of inactivity. The connections will still timeout, even if the
> connection is working fine but will timeout after blocking on a read for
> this length of time.   I am about +.5 on this one.  It would be nice to
> provide the capability, but hesitate to add yet another property.
>
>
Haven't seen a consensus for #3 so will just abandon this vote.
An option was proposed for an embedded option which would only timeout
the connection if it is holding locks.  Anyone interested in this or
other options can pursue it on another thread.  I won't go down that
path as it is unlikely  I would implement something like that myself.


Thanks

Kathey

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBcGY8G0h36bFmkocRAoPyAJwLPtn94EyMawJflnpzBS97ciSI7QCgm0Uq
sXNUubZ/39B5HIcaP9jPmeY=
=WIFI
-----END PGP SIGNATURE-----

Re: [VOTE] Re: Help detecting client disconnects for network server

Posted by Kathey Marsden <km...@Sourcery.Org>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Basically it would call setSoTimeout on the socket.
See:
http://java.sun.com/j2se/1.4.2/docs/api/java/net/Socket.html#setSoTimeout(int)

The time specified is a timeout after a period of  inactivity, not
ability to connect.

Army wrote:

> Kathey Marsden wrote:
>
>> OK, so, given the input from everyone I submit the following solution to
>> vote:
>
>
> [ snip ]
>
>> 3) [ ...] The connections will still timeout, even if the
>> connection is working fine but will timeout after blocking on a read for
>> this length of time. [ ... ]
>
>
> I'm not sure I understand what this sentence is saying...
>
> Army
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBatemG0h36bFmkocRApbtAJ9jrGDmDC1UqAYHgFHwl4FyBayBjgCfQM6m
1heWrtqm/57cL3G33LVDROc=
=uo+O
-----END PGP SIGNATURE-----

Re: [VOTE] Re: Help detecting client disconnects for network server

Posted by Army <ar...@golux.com>.

Kathey Marsden wrote:

> OK, so, given the input from everyone I submit the following solution to
> vote:

[ snip ]

> 3) [ ...] The connections will still timeout, even if the
> connection is working fine but will timeout after blocking on a read for
> this length of time. [ ... ]

I'm not sure I understand what this sentence is saying...

Army

Re: [VOTE] Re: Help detecting client disconnects for network server

Posted by Jan Hlavatý <hl...@code.cz>.

Suresh Thalamati wrote:

> Which one of the following properties have higher precedence ?   For
> example if  user sets  derby.drda.keepAlive = true  (Assume 2 hours on a
> platform)  and derby.drda.connSoTimeout=  1hour., Connection is going to
> be terminated after 2 hours or 1hour ?

They are completely unrelated. You will die on timeout.

Keepalive only applies for broken connections where other end went down.

Timeout will affect live connections that are not broken, just quiet.

Re: [VOTE] Re: Help detecting client disconnects for network server

Posted by Kathey Marsden <km...@Sourcery.Org>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Suresh Thalamati wrote:
> Which one of the following properties have higher precedence ?   For
> example if  user sets  derby.drda.keepAlive = true  (Assume 2 hours on a
> platform)  and derby.drda.connSoTimeout=  1hour., Connection is going to
> be terminated after 2 hours or 1hour ?
>
> Thanks
> -suresht
>

1 hour if the connection is idle because of the soTimeout.  keepAlive
would never kick in.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBaxl0G0h36bFmkocRAsk4AKCPiYT/lWcfLqwyqC+3FuVGa3rIugCeKB/U
dihMFzKp1qRMKyhZ7G66vEM=
=wtH0
-----END PGP SIGNATURE-----

Re: [VOTE] Re: Help detecting client disconnects for network server

Posted by Suresh Thalamati <ts...@Source-Zone.org>.

Which one of the following properties have higher precedence ?   For
example if  user sets  derby.drda.keepAlive = true  (Assume 2 hours on a
platform)  and derby.drda.connSoTimeout=  1hour., Connection is going to
be terminated after 2 hours or 1hour ?

Thanks
-suresht


Kathey Marsden wrote:

> Samuel Andrew McIntyre wrote:
>
> >So, I'm still not sure that I like having keepalive set by default
> >without a way to turn it off.
>
>
> OK, so, given the input from everyone I submit the following solution to
> vote:
>
>
> 1) Have keepAlive on by default. It seems important not only for locks
> but for potential network server bloat due to connections not getting
> cleaned up.
>     
>
> 2) Add a property derby.drda.keepAlive={true|false} (defaults to true as
> described above).  There seems to be a need to be able to turn keepAlive
> off in some cases.
>     
>
> 3) Add property derby.drda.connSoTimeout=<milliseconds> (defaults to 0,
> infinite) to provide the ability to have connections timeout after a
> period of inactivity. The connections will still timeout, even if the
> connection is working fine but will timeout after blocking on a read for
> this length of time.   I am about +.5 on this one.  It would be nice to
> provide the capability, but hesitate to add yet another property.
>
>
>
> Kathey
>
>
>
>     
>
>
>
>

Re: [VOTE] Re: Help detecting client disconnects for network server

Posted by Samuel Andrew McIntyre <fu...@nonintuitive.com>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Oct 11, 2004, at 4:38 PM, Jan Hlavatý wrote:

> You seem to misunderstand keepalive mechanism.

I admit that exactly what keepalive does and how it interacts with 
SoTimeout is confusing. :) Both the javadoc and the available unix 
documentation for the native methods are not exceptionally clear, and 
it wasn't until I tried each option out with various settings and 
looked at how the underlying timer was implemented that I got a grasp 
on what these options really do.

I'm +1 to enabling keepalive by default. At this point, I'm just trying 
to understand what would be the problem with making it a configurable 
option. Here are the reasons why I think it's a good thing for us to 
make it configurable:

a) keepalive is opaque. Because we cannot get or set the parameters 
which control its behavior, we don't know exactly what the behavior 
will be when it's turned on. So, it might be desirable to turn it off, 
in the interest of consistency and having exact control over the 
behavior of the application.

b) the implementation of keepalive is OS-dependent. That implementation 
could be buggy, or worse, non-existent. This compounds (a), where you 
can't know exactly what the behavior will be when you enable it. So, 
another plus for control of behavior and for configurable timeouts on 
the server end where the mechanism is defined by us and completely in 
our domain of control.

c) interaction with SoTimeout can lead to unexpected behavior if 
keepalive is not configurable. As described earlier, SoTimeout(0) 
doesn't do what we expect if we can't turn off keepalive. Same 
situation if the machine timeout + probe intervals < SoTimeout(). But, 
we can't ever tell whether or not SoTimeout > keepalive timeout because 
of (a). So, another plus one for configurable timeouts on the server 
end.

d) keepalive could be keeping a zombie connection alive. I've 
personally dealt with two systems in the last few months that suffered 
a catastrophic disk/filesystem problem but where the network layer of 
the kernel was still active and happy. I'll spare you the details, but 
network services from these machines were behaving erratically 
depending on what files the services accessed, and it wasn't until I 
stood in front of the console that the cause of the failure became 
clear. In this case, keepalive would have had the opposite of the 
intended effect: a bad high-level connection is kept alive by the 
low-level mechanism. I admit that this is a minor detail, but it can be 
frustrating to determine the cause of failure in such a situation. So, 
another plus one for configurable timeouts on the server end.

e) Dan's points concerning dropping good connections, bandwidth use, 
and charge-by-packet.

f) The use of keepalive is always offered as an option in other 
development environments. Why would we require it?

Like I said, I'm +1 to enabling keepalive by default. In most cases it 
will do what is expected and the effort in making it a configurable 
option is minimal. I just think there are good reasons for making the 
use of keepalive an option and not a requirement. And also, that an 
interesting opportunity lies in finding an alternative method to 
solving the locking issue in the original post of this thread.

andrew
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iD8DBQFBa4VJDfB0XauCH7wRAkNzAJ45Ke8oki5XGrkMRBP52/Y7iY8VgwCfaoYe
a8lHDsIR+brrlW3JGM0Z0qg=
=SKyO
-----END PGP SIGNATURE-----

Re: [VOTE] Re: Help detecting client disconnects for network server

Posted by Jan Hlavatý <hl...@code.cz>.

Andrew McIntyre wrote:
> But this is certainly the most expedient way at the moment to reap
> silent client connections, and leaves open the possibility that we can
> have connections that are just silent for a long time, for whatever
> reason, or for some reason, keepalive is set to a lower value on the
> system than is desirable for the particular Derby application. +1

You seem to misunderstand keepalive mechanism. Keepalive has no effect
on connections that remain silent for a long time but are otherwise OK.
It's not a timeout that kills a quiet connection - its a connection health check
triggered by long time of inactivity that verifies the other end is still up.
It only kills connections which are broken (where one of the ends fails to ack
the keepalive probe). There is no need to turn off keepalive.
It has no effect on healthy connections. Its only effect is dead connections where
one of the ends went down without telling the other end are detected and closed.

Re: [VOTE] Re: Help detecting client disconnects for network server

Posted by Andrew McIntyre <fu...@nonintuitive.com>.

On Oct 11, 2004, at 10:43 AM, Kathey Marsden wrote:

> 1) Have keepAlive on by default. It seems important not only for locks
> but for potential network server bloat due to connections not getting
> cleaned up.
>
> 2) Add a property derby.drda.keepAlive={true|false} (defaults to true 
> as
> described above).  There seems to be a need to be able to turn 
> keepAlive
> off in some cases.

I'd still rather have an internal timing mechanism instead of relying 
on an external mechanism that we cannot discern or set the properties 
of and that the implementation of which is inconsistent across 
platforms. But this is certainly the most expedient way at the moment 
to reap silent client connections, and leaves open the possibility that 
we can have connections that are just silent for a long time, for 
whatever reason, or for some reason, keepalive is set to a lower value 
on the system than is desirable for the particular Derby application. 
+1

> 3) Add property derby.drda.connSoTimeout=<milliseconds> (defaults to 0,
> infinite) to provide the ability to have connections timeout after a
> period of inactivity. The connections will still timeout, even if the
> connection is working fine but will timeout after blocking on a read 
> for
> this length of time.   I am about +.5 on this one.  It would be nice to
> provide the capability, but hesitate to add yet another property.

It seems like this would be nice for tuning purposes, e.g. in cases 
where you expect a large number of client connects and you want each 
connection to be short-lived. Since the default value is essentially 
the same behavior we have today, I don't see a problem with adding the 
property. +1

andrew

Re: [VOTE] Re: Help detecting client disconnects for network server

Posted by Daniel John Debrunner <dj...@debrunners.com>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jan Hlavatý wrote:

> Kathey Marsden wrote:
>
>>2) Add a property derby.drda.keepAlive={true|false} (defaults to true as
>>described above).  There seems to be a need to be able to turn keepAlive
>>off in some cases.
>
>
> And why is that? Who is worried by keepalive and why?
> There is no overhead associated with it.

Is that true, no overhead, or is it *low* overhead?

- From a quick google search I found this, which indicates keepalive is a
controversial feature.

http://home.student.uu.se/j/jolo4453/projekt/tcpip1/tcp_keep.htm#23_0

(which seems to be this actual book)
http://www.aw-bc.com/catalog/academic/product/0,1144,0201633469-TOC,00.html

See this quote
[quote]
Keepalives are not part of the TCP specification. The Host Requirements
RFC provides three reasons not to use them: (1) they can cause perfectly
good connections to be dropped during transient failures, (2) they
consume unnecessary bandwidth, and (3) they cost money on an internet
that charges by the packet.
[end-quote]

So what would be the downside of allow keep alive to be disabled?

Dan.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD4DBQFBazJhIv0S4qsbfuQRAo8AAJ99J6M7uMb8+nDPbWmQw4xX/6MuswCYxxg1
S4uvbpVeb7Pg83XKmHBY+w==
=2d/M
-----END PGP SIGNATURE-----

Re: [VOTE] Re: Help detecting client disconnects for network server

Posted by Jan Hlavatý <hl...@code.cz>.

Kathey Marsden wrote:
> 2) Add a property derby.drda.keepAlive={true|false} (defaults to true as
> described above).  There seems to be a need to be able to turn keepAlive
> off in some cases.

And why is that? Who is worried by keepalive and why?
There is no overhead associated with it.

Jan

[VOTE] Re: Help detecting client disconnects for network server

Posted by Kathey Marsden <km...@Sourcery.Org>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Samuel Andrew McIntyre wrote:
> So, I'm still not sure that I like having keepalive set by default
> without a way to turn it off.

OK, so, given the input from everyone I submit the following solution to
vote:

1) Have keepAlive on by default. It seems important not only for locks
but for potential network server bloat due to connections not getting
cleaned up.

2) Add a property derby.drda.keepAlive={true|false} (defaults to true as
described above).  There seems to be a need to be able to turn keepAlive
off in some cases.

3) Add property derby.drda.connSoTimeout=<milliseconds> (defaults to 0,
infinite) to provide the ability to have connections timeout after a
period of inactivity. The connections will still timeout, even if the
connection is working fine but will timeout after blocking on a read for
this length of time.   I am about +.5 on this one.  It would be nice to
provide the capability, but hesitate to add yet another property.

Kathey

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBasIbG0h36bFmkocRAuGIAJ4138QyCiQfPUYi9huGSva/wrnaqwCgi4lL
usc5GVA4F8NSioDH1a+qf+I=
=w+tm
-----END PGP SIGNATURE-----

Re: Help detecting client disconnects for network server

Posted by Samuel Andrew McIntyre <fu...@nonintuitive.com>.

On Oct 10, 2004, at 4:33 AM, Jan Hlavatý wrote:

> This is simply not true. SO_KEEPALIVE works, even for blocking in 
> reads.
> I have tested it.

Agreed. After writing the previous comments, I decided I should just go 
see how it works in practice. :-) After the keepalive timeout, a 
SocketException is thrown. (Exception in thread "main" 
java.net.SocketException: Operation timed out at 
java.net.SocketInputStream.socketRead0(Native Method))

I also tried the following (all on Mac OS X):

setting .setSoTimeout(0) along with keepalive, and the same behavior is 
observed. It would then appear that without a way to turn off 
keepalive, there's no way to specify that we want to keep the socket 
open indefinitely.

Also, in the case where timeout interval > keepalive interval, the 
keepalive timer closes the socket before the timeout value is reached. 
While this may not be a problem for some platforms, the default value 
for net.inet.tcp.keepidle on the Mac is 144s and the interval for 
probing is only 1.5 secs, meaning that with keepalive turned on, 
timeout values greater than 3 minutes would have no effect and 
keepalive would always close the socket at between 2 and 3 minutes. It 
would seem that Mac OS X is violating RFC1122, but still, that's what 
the value currently is on the three systems at my disposal.

So, I'm still not sure that I like having keepalive set by default 
without a way to turn it off. Anyway, I haven't been able to test this 
on other operating systems yet to see if the same behavior is true for 
setSoTimeout/setKeepAlive, but I'm curious if there's any difference 
between Linux, Solaris and Windows, so I'll get back to you on that.

andrew

Re: Help detecting client disconnects for network server

Posted by Jan Hlavatý <hl...@code.cz>.

Samuel Andrew McIntyre wrote:
> The way I read it, setting SO_KEEPALIVE just sets the option of the same
> name in the underlying native implementation. While the underlying TCP
> implementation on the server would keep track of the whether or not the
> client machine responded to probes by the server, the application would
> not actually be notified until a write was attempted on the socket after
> the keepalive timeout (or the timeout expired while we're blocked on a
> write), at which time the native implementation would return SIGPIPE and
> presumably a SocketException would be thrown.
> 
> In the case where we're blocked on a read, the application would never
> be notified. In the case that you need to be able to timeout on a read,
> you would need to use SO_TIMEOUT or implement your own timer. Or am I
> misunderstanding how keepalive works?

This is simply not true. SO_KEEPALIVE works, even for blocking in reads.
I have tested it.

Jan

Re: Help detecting client disconnects for network server

Posted by Jan Hlavatý <hl...@code.cz>.

Samuel Andrew McIntyre wrote:

> The way I read it, setting SO_KEEPALIVE just sets the option of the same
> name in the underlying native implementation. While the underlying TCP
> implementation on the server would keep track of the whether or not the
> client machine responded to probes by the server, the application would
> not actually be notified until a write was attempted on the socket after
> the keepalive timeout (or the timeout expired while we're blocked on a
> write), at which time the native implementation would return SIGPIPE and
> presumably a SocketException would be thrown.
> 
> In the case where we're blocked on a read, the application would never
> be notified. In the case that you need to be able to timeout on a read,
> you would need to use SO_TIMEOUT or implement your own timer. Or am I
> misunderstanding how keepalive works?

Where did you read this? Sounds pretty bad design, thus improbable ;)
Think I'll test what Linux and XP does.

Jan

Re: Help detecting client disconnects for network server

Posted by Samuel Andrew McIntyre <fu...@nonintuitive.com>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Oct 9, 2004, at 4:51 AM, Jan Hlavatý wrote:

> Did you try setting SO_KEEPALIVE socket option using 
> Socket.setKeepAlive(boolean)?
> That what its for.

The way I read it, setting SO_KEEPALIVE just sets the option of the 
same name in the underlying native implementation. While the underlying 
TCP implementation on the server would keep track of the whether or not 
the client machine responded to probes by the server, the application 
would not actually be notified until a write was attempted on the 
socket after the keepalive timeout (or the timeout expired while we're 
blocked on a write), at which time the native implementation would 
return SIGPIPE and presumably a SocketException would be thrown.

In the case where we're blocked on a read, the application would never 
be notified. In the case that you need to be able to timeout on a read, 
you would need to use SO_TIMEOUT or implement your own timer. Or am I 
misunderstanding how keepalive works?

andrew
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iD8DBQFBaI5ODfB0XauCH7wRAkwfAJ4gjCfXpXvagAPQ3U+dR5l7LNEzOACcCMqf
rioNA8IgFNbBRQHrMspMFOE=
=OgHV
-----END PGP SIGNATURE-----

Re: Help detecting client disconnects for network server

Posted by Jan Hlavatý <hl...@code.cz>.

Did you try setting SO_KEEPALIVE socket option using Socket.setKeepAlive(boolean)?
That what its for. The timeout may be really long though (sbout 2 hours),
but may be tuned on OS level (globally).

On linux, you can use sysctl (/etc/sysctl.conf):

net/ipv4/tcp_keepalive_time = <number of seconds to send probe>
net/ipv4/tcp_keepalive_probes = <number of failed probes to kill connection>

Or use /proc to configure directly:
/proc/sys/net/ipv4/tcp_keepalive_time
/proc/sys/net/ipv4/tcp_keepalive_probes

For Win XP:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;314053

For Win NT/2K, see KeepAliveTime and KeepAliveInterval here:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;120642

For Win 95/98/Me:
http://support.microsoft.com/default.aspx?scid=kb;en-us;158474

Also note, unplugging a network cable does not mean the TCP connection is disconnected.
TCP does not track the state of the transport media in any way.
You can unplug the cable, plug it back and happily continue.

Re: Help detecting client disconnects for network server

Posted by Samuel Andrew McIntyre <fu...@nonintuitive.com>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Oct 8, 2004, at 4:41 PM, Kathey Marsden wrote:

> Reading a little about this it seems the only way to really detect if
> the socket is active is to attempt a write. This of course is not an
> option since it will mess up the drda protocol.  Other things I have
> looked at are Socket.setSoTimeout, which seems no good because the
> client might in fact just be sitting there doing nothing for a long  
> time.
>
> Any ideas?

I looked around and unfortunately it appears that attempting a write is  
the only way to conclusively determine that the connection is no longer  
active. It appears that this is a consequence of the design of TCP and  
is not a Java-specific problem. Check out:

http://forum.java.sun.com/thread.jsp? 
forum=11&thread=539297&start=0&range=15&tstart=60&trange=15

As evidence that this is not a Java-specific problem, see:

http://lists.gnu.org/archive/html/bug-commoncpp/2004-01/msg00027.html

Unfortunately, since the problem for the server occurs on the reading  
side of the transaction, it doesn't seem like there's anything the  
server can do to test the connection, like attempting to write the next  
few bytes of the transaction to the socket. Only the client would be  
able to tell that the connection had been lost. It would appear that  
implementing some sort of timeout is the only way around the problem,  
short of altering the DRDA specification to allow a way for the server  
to probe the state of the connection within the boundaries of the  
protocol. :)

If it was decided to implement some sort of timeout, obviously it would  
be nice to be able to set the length of the timeout on the server side  
so that individual application developers could decide what the length  
of an acceptable timeout for a client should be, along with similar  
functionality on the client side so that a particular transaction could  
be attempted again once the connection is again available. Also, in the  
interest of recoverability for the transaction, would it be possible  
that the server could invalidate the entire transaction initiated by  
the client once the timeout is reached, and, once the client regains  
connectivity, that the client could once again attempt the entire  
transaction?

Not sure that helps, but it's my 2 cents,
andrew
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iD8DBQFBZ6g/DfB0XauCH7wRAjN0AJ9AULQSuMMqgvASKEa06oi1+J1jZgCfVVjz
W8NXltNQewsAnd3ZMoE4/ao=
=H/6O
-----END PGP SIGNATURE-----

Re: SPAM=**** Help detecting client disconnects for network server

Posted by Jonas S Karlsson <jo...@gmail.com>.

My mail seems to have some delivery problems, so I'm using this
account, on saturday evening I wrote, apologies for eventual
duplicates.

-----------------------------------------------------------------

I'm no expert on DRDA but I've played and written some network
applications, so I have some ideas...

Kathey Marsden wrote:
> I am working on an issue with Network Server where the server does not
> clean up connection threads properly for disconnected clients.  I get an
> IOException if the client program is killed or aborted with <ctrl> c
> but if the network cable is just unplugged or the machine turned off, we
> don't detect the disconnect properly.

There is no disconnection, the concept of a "network connection" is
somewhat not a good allegory, because there isn't really any
connection. Disconnection of a TCP connection is an active event by a
program/OS, and requires communication. If nobody is there to
communicate, or the communcation channel is broken, then nobody is
there to disconnect. The OS closes connections/files when a program
terminates. That's why you see it. There is generally no way to verify
that the other end is there, unless it tells you that it is. There is
an automatic disconnect after a while of "unused" sockets, but this
could be as long as "hours" (OS? implementation dependent), and can be
turned on/off using .SetKeepAlive(), however you can't set the time...

> So, if I do this on the client machine
>
> create table t (i int);
> autocommit off;
> lock table t in exclusive mode;
> 
> Then disconnect the cable, the server will continue to block on the
> inputStream.read() and the connection will continue to hold the lock so
> no one else can select from the table.

...and if you connect the cable again, you're likely to be able to
communicate again (assuming static IP-address etc). This is a good
thing because there may be seconds when packages don't reach their
target and are being resent, or routed different routes, and the
"connection" is then retained, unless it times out. This is the normal
operation of internet/networks, it hickups, a router is reset, etc.

> Reading a little about this it seems the only way to really detect if
> the socket is active is to attempt a write. This of course is not an
> option since it will mess up the drda protocol.  Other things I have
> looked at are Socket.setSoTimeout, which seems no good because the
> client might in fact just be sitting there doing nothing for a long time.

That a client is doing nothing for a long time and being in a
transaction with the server seems to be the case you want to catch and
I feel that you just provided a solution yourself. When being inside a
transaction (i.e, there are locked objects in the transaction) set the
Socket.setSoTimeout to an acceptable value, when a timeout is
received, rollback the transaction. Typically, one would like the
server to be configured, or the client to be able to set what timeout
to use, I guess. After a commit/rollback, I'd expect no objects to be
locked, thus timeout can be set really long (=0 == infinite).

Another solution, would be to require the client to "ping" the server
at regular intervals, kind of like a remote "watchdog" process, the
server clears a flag for that client at any communcation, keeps the
time when it last recieved communication.  A server watchdog thread
can then at regular time interval (x times longer than the interval at
the client) check that the the flag is cleared/time is acceptable, and
if not "kill" the client connection. If the client is ok, the flag is
set by the client. The "ping" would be any "cheap" request that the
server can respond easily to, like a "status" function, "get variable
value" or such, I don't know DRDA well enough to suggest any fitting
command.  Sometimes a "good command" is to send an illegal
command/string which will give an error from the server, that too, is
communication of that the client is alive, and should be "cheap" for
the server. Logging excluded.

Hope the ideas are of some use, I believe that one could do things on
lower levels, outside the TCPIP connection, but that would require
much more work, and be roughly equivalent.

/Jonas (jsk@yesco.org)

Re: Help detecting client disconnects for network server

Posted by Samuel Andrew McIntyre <fu...@nonintuitive.com>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Oct 9, 2004, at 11:06 AM, Kathey Marsden wrote:

> 	1)  Fix network server to use setKeepAlive so setting the system
> keepalive timeout will affect it.

by default, or as an option?

> 	2) Add a db2j.drda.connTimeout property so users can set an absolute
> limit on connection life.  Connections would timeout after this limit
> regardless of whether there was connectivity.

It might be nice to be able to set the timeout per-connection instead 
of (or in addition to) having a global timeout value, but maybe that's 
overkill.

andrew
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iD8DBQFBaHzFDfB0XauCH7wRAqiZAKCISVkq66it1wP0GXnCOz4fvIlVEgCaAohY
Ojc66CpxZx1Zd6ijb4R8YGI=
=ZY3+
-----END PGP SIGNATURE-----

RE: Help detecting client disconnects for network server

Posted by "Noel J. Bergman" <no...@devtech.com>.

> 1)  Fix network server to use setKeepAlive so setting the
>     system keepalive timeout will affect it.
> 2) Add a db2j.drda.connTimeout property so users can set
>    an absolute limit on connection life.

We do something similar in JAMES, although I'd like to change our
implementation, where we use a watchdog to timeout inactive connections.

	--- Noel

Re: Help detecting client disconnects for network server

Posted by Kathey Marsden <km...@Sourcery.Org>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jan Hlavatý wrote:

>Did you try setting SO_KEEPALIVE socket option using
Socket.setKeepAlive(boolean)?
>That what its for. The timeout may be really long though (sbout 2 hours),
>but may be tuned on OS level (globally).

Yes, I tried setKeepAlive but the timeout was too long and I didn't know
 I had to set it in the OS. Thanks for the info on that.  It seems less
than optimal because it would affect other applications as well, but
maybe that's ok.

I was thinking maybe the best approach would be to.

	1)  Fix network server to use setKeepAlive so setting the system
keepalive timeout will affect it.
	2) Add a db2j.drda.connTimeout property so users can set an absolute
limit on connection life.  Connections would timeout after this limit
regardless of whether there was connectivity.


Thoughts?

Kathey

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBaCi5G0h36bFmkocRAsd+AKCS8lAgjGZ7jONrASlWSUmpRogvlQCglOrL
FbNUxtWg8uCrQ6ouoQJwf+s=
=RSC5
-----END PGP SIGNATURE-----

Re: Help detecting client disconnects for network server

Posted by Jonas S Karlsson <js...@yesco.org>.

I'm no export on DRDA but I've played and written some network
applications, so I have some ideas...

Kathey Marsden wrote:
> I am working on an issue with Network Server where the server does not
> clean up connection threads properly for disconnected clients.  I get an
> IOException if the client program is killed or aborted with <ctrl> c
> but if the network cable is just unplugged or the machine turned off, we
> don't detect the disconnect properly.

There is no disconnection, the concept of a "network connection" is
somewhat not a good allegory, because there isn't really any
connection. Disconnection of a TCP connection is an active event by a
program/OS, and requires communication. If nobody is there to
communicate, or the communcation channel is broken, then nobody is
there to disconnect. The OS closes connections/files when a program
terminates. That's why you see it. There is generally no way to verify
that the other end is there, unless it tells you that it is. There is
an automatic disconnect after a while of "unused" sockets, but this
could be as long as "hours" (OS? implementation dependent), and can be
turned on/off using .SetKeepAlive(), however you can't set the time...

> So, if I do this on the client machine
>
> create table t (i int);
> autocommit off;
> lock table t in exclusive mode;
> 
> Then disconnect the cable, the server will continue to block on the
> inputStream.read() and the connection will continue to hold the lock so
> no one else can select from the table.

...and if you connect the cable again, you're likely to be able to
communicate again (assuming static IP-address etc). This is a good
thing because there may be seconds when packages don't reach their
target and are being resent, or routed different routes, and the
"connection" is then retained, unless it times out. This is the normal
operation of internet/networks, it hickups, a router is reset, etc.

> Reading a little about this it seems the only way to really detect if
> the socket is active is to attempt a write. This of course is not an
> option since it will mess up the drda protocol.  Other things I have
> looked at are Socket.setSoTimeout, which seems no good because the
> client might in fact just be sitting there doing nothing for a long time.

That a client is doing nothing for a long time and being in a
transaction with the server seems to be the case you want to catch and
I feel that you just provided a solution yourself. When being inside a
transaction (i.e, there are locked objects in the transaction) set the
Socket.setSoTimeout to an acceptable value, when a timeout is
received, rollback the transaction. Typically, one would like the
server to be configured, or the client to be able to set what timeout
to use, I guess. After a commit/rollback, I'd expect no objects to be
locked, thus timeout can be set really long (=0 == infinite).

Another solution, would be to require the client to "ping" the server
at regular intervals, kind of like a remote "watchdog" process, the
server clears a flag for that client at any communcation, keeps the
time when it last recieved communication.  A server watchdog thread
can then at regular time interval (x times longer than the interval at
the client) check that the the flag is cleared/time is acceptable, and
if not "kill" the client connection. If the client is ok, the flag is
set by the client. The "ping" would be any "cheap" request that the
server can respond easily to, like a "status" function, "get variable
value" or such, I don't know DRDA well enough to suggest any fitting
command.  Sometimes a "good command" is to send an illegal
command/string which will give an error from the server, that too, is
communication of that the client is alive, and should be "cheap" for
the server. Logging excluded.

Hope the ideas are of some use, I believe that one could do things on
lower levels, outside the TCPIP connection, but that would require
much more work, and be roughly equivalent.

/Jonas

Re: Help detecting client disconnects for network server

Posted by Jan Hlavatý <hl...@code.cz>.

Samuel Andrew McIntyre wrote:
> Is it really necessary to reinvent keep-alive at a higher level? Why not
> just decide that 'X' (m)secs without response indicates that a failure
> has occurred of enough significance to cancel a pending transaction? And
> if the value of 'X' is variable, that the application developer writing
> the client/server application can decide what the appropriate value of
> 'X' should be?

Well, there alre legal reasons why client wouldnt talk for a long time.
For example, having a connection pool, and web application not being accessed too much.
These "legal" states are usually when no transaction is active on connection -
maybe we could modify the timeouts to reflect that (allow client to keep quiet when no transaction
is active for indefinite time, while be more aggressive in checking clients presence/timeouts when
transaction is active and resources are being held by it.

Jan

Re: Help detecting client disconnects for network server

Posted by Andrew McIntyre <fu...@nonintuitive.com>.

On Oct 10, 2004, at 3:21 AM, Samuel Andrew McIntyre wrote:

> Why not just decide that 'X' (m)secs without response indicates that a  
> failure has occurred of enough significance to cancel a pending  
> transaction?

To expand on this:

What if the embedded instance that network server is connecting to  
allowed setting an upper limit on the amount of time any transaction  
can take? Then, when this limit is hit regardless what state server and  
client is in the database rolls back the transaction, and the locks are  
released anyway. Network server would still need to do cleanup of the  
client connection, but at least that would solve the problem of locks  
being held indefinitely.

A configurable upper limit to the life of any transaction that holds a  
lock seems like it would be a good thing to have, even for embedded and  
not just for Network Server. At some point, lack of communication means  
that something has gone wrong, whether it is a communications or  
systems error, and we should probably rollback the transaction and  
release the locks to let processing continue. This is what DB2 for z/OS  
does - keep track of which threads hold locks and time them out  
according to an attribute of the connection:

http://publib.boulder.ibm.com/infocenter/dzichelp/topic/ 
com.ibm.db2.doc.admin_8.1.0/bjndmstr613.htm (for z/OS, idle thread  
timeout parameter, note they also suggest setting your keepalive value  
low if you're really worried about resource consumption)

Strangely, I couldn't find anything on how Oracle or DB2 UDB would  
handle such a situation, even regarding behavior with respect to TCP  
keepalive. Does anyone know how this is handled in other databases?

I suppose this might be overly complicated to implement, considering  
what we're after, but it would address the problem.

andrew

Re: Help detecting client disconnects for network server

Posted by Samuel Andrew McIntyre <fu...@nonintuitive.com>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Oct 9, 2004, at 11:57 PM, Jonas S Karlsson wrote:

> Another solution, would be to require the client to "ping" the server
> at regular intervals, kind of like a remote "watchdog" process, the
> server clears a flag for that client at any communcation, keeps the
> time when it last recieved communication.  A server watchdog thread
> can then at regular time interval (x times longer than the interval at
> the client) check that the the flag is cleared/time is acceptable, and
> if not "kill" the client connection.

I just want to point out that this is almost precisely what TCP 
keep-alive tries to do at a low level. And that, in practice, it fails 
to determine anything about the state of the network between the client 
and server. This means that it is left up to the reader (i.e. the 
listener expecting response to a ping) to decide what the absence of a 
response to such a 'ping' actually means.

Is it really necessary to reinvent keep-alive at a higher level? Why 
not just decide that 'X' (m)secs without response indicates that a 
failure has occurred of enough significance to cancel a pending 
transaction? And if the value of 'X' is variable, that the application 
developer writing the client/server application can decide what the 
appropriate value of 'X' should be?

andrew
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iD8DBQFBaQ1gDfB0XauCH7wRAkdNAJ4+5sGWUdZjObEjYo5ycSqDHUw4QACfdu8D
dNmyvdyqCrH6R+ETF4hFbHs=
=fPre
-----END PGP SIGNATURE-----

Re: Help detecting client disconnects for network server

Posted by Jonas S Karlsson <js...@yesco.org>.

I'm no export on DRDA but I've played and written some network
applications, so I have some ideas...

Kathey Marsden wrote:
> I am working on an issue with Network Server where the server does not
> clean up connection threads properly for disconnected clients.  I get an
> IOException if the client program is killed or aborted with <ctrl> c
> but if the network cable is just unplugged or the machine turned off, we
> don't detect the disconnect properly.

There is no disconnection, the concept of a "network connection" is
somewhat not a good allegory, because there isn't really any
connection. Disconnection of a TCP connection is an active event by a
program/OS, and requires communication. If nobody is there to
communicate, or the communcation channel is broken, then nobody is
there to disconnect. The OS closes connections/files when a program
terminates. That's why you see it. There is generally no way to verify
that the other end is there, unless it tells you that it is. There is
an automatic disconnect after a while of "unused" sockets, but this
could be as long as "hours" (OS? implementation dependent), and can be
turned on/off using .SetKeepAlive(), however you can't set the time...

> So, if I do this on the client machine
>
> create table t (i int);
> autocommit off;
> lock table t in exclusive mode;
> 
> Then disconnect the cable, the server will continue to block on the
> inputStream.read() and the connection will continue to hold the lock so
> no one else can select from the table.

...and if you connect the cable again, you're likely to be able to
communicate again (assuming static IP-address etc). This is a good
thing because there may be seconds when packages don't reach their
target and are being resent, or routed different routes, and the
"connection" is then retained, unless it times out. This is the normal
operation of internet/networks, it hickups, a router is reset, etc.

> Reading a little about this it seems the only way to really detect if
> the socket is active is to attempt a write. This of course is not an
> option since it will mess up the drda protocol.  Other things I have
> looked at are Socket.setSoTimeout, which seems no good because the
> client might in fact just be sitting there doing nothing for a long time.

That a client is doing nothing for a long time and being in a
transaction with the server seems to be the case you want to catch and
I feel that you just provided a solution yourself. When being inside a
transaction (i.e, there are locked objects in the transaction) set the
Socket.setSoTimeout to an acceptable value, when a timeout is
received, rollback the transaction. Typically, one would like the
server to be configured, or the client to be able to set what timeout
to use, I guess. After a commit/rollback, I'd expect no objects to be
locked, thus timeout can be set really long (=0 == infinite).

Another solution, would be to require the client to "ping" the server
at regular intervals, kind of like a remote "watchdog" process, the
server clears a flag for that client at any communcation, keeps the
time when it last recieved communication.  A server watchdog thread
can then at regular time interval (x times longer than the interval at
the client) check that the the flag is cleared/time is acceptable, and
if not "kill" the client connection. If the client is ok, the flag is
set by the client. The "ping" would be any "cheap" request that the
server can respond easily to, like a "status" function, "get variable
value" or such, I don't know DRDA well enough to suggest any fitting
command.  Sometimes a "good command" is to send an illegal
command/string which will give an error from the server, that too, is
communication of that the client is alive, and should be "cheap" for
the server. Logging excluded.

Hope the ideas are of some use, I believe that one could do things on
lower levels, outside the TCPIP connection, but that would require
much more work, and be roughly equivalent.

/Jonas